Access to another struct by value or by pointer - struct

What difference there is when you access to another struct by value or by a pointer?
When should be used each one of them?
type foo_ struct {
st uint8
nd uint8
}
type bar struct {
rd uint8
foo foo_
}
type barP struct {
rd uint8
foo *foo_
}

If you declare or allocate a variable of type bar, you reserve and initialize to zero memory for both rd uint8 and foo foo_. There is always one variable of type foo_ embedded in a variable of type bar.
var b bar // declare b
If you declare or allocate a variable of type barP, you reserve and initialize to zero memory for both rd uint8 and foo *foo_. A zero value pointer is a nil pointer. No variable of type foo_ is allocated; you must do that separately. There is either zero (foo == nil) or one variable of type foo_ pointed to by a variable of type barP. A variable of type barP may point to the same variable of type foo_ as other variables of type barP, sharing the same copy of the variable of type foo_. A change to a shared copy is seen by all variables that point to it.
var bp barP // declare bp
bp.foo = new(foo_) // allocate bp.foo
Which one to use depends on the properties of type bar versus type barP. Which type more closely reflects the problem that you are trying to solve?
For example, consider this invoice problem. We always have a billing address; we are always going to ask for our money. However, we often ship to the billing address, but not always. If the shipping address is nil, use the billing address. Otherwise, use a separate shipping address. We have two warehouses, and we always ship from one or the other. We can share the two warehouse locations. Since we don't send an invoice until the order is shipped from the warehouse, the warehouse location will never be nil.
type address struct {
street string
city string
}
type warehouse struct {
address string
}
type invoice struct {
name string
billing address
shipping *address
warehouse *warehouse
}

The answer is largely independent of language - the equivalent in C has the same issues.
When you have an embedded value (as in bar), then your structure is big enough to hold the complete sub-structure and the other part.
When you have a pointer to a value (as in barP), then a number of structures of type barP may share the same foo. When any of the barP modifies a part of the foo it points to, it affects all the other barP structures that point to the same place. Also, as the commentary suggests, you have to manage two separate objects - the barP and the foo as against one with the plain bar type.
In some languages, you would have to worry about dangling pointers and uninitialized values etc; Go is garbage collected and generally more type-safe than other languages.
So, use a pointer when you want multiple barP objects to share the same foo object; otherwise, use an explicit member object, rather than a pointer to an object.

The Golang FAQ now summarizes the difference between:
func (s *MyStruct) pointerMethod() { } // method on pointer
func (s MyStruct) valueMethod() { } // method on value
First, and most important, does the method need to modify the receiver?
If it does, the receiver must be a pointer. (Slices and maps are reference types, so their story is a little more subtle, but for instance to change the length of a slice in a method the receiver must still be a pointer.)
In the examples above, if pointerMethod modifies the fields of s, the caller will see those changes, but valueMethod is called with a copy of the caller's argument (that's the definition of passing a value), so changes it makes will be invisible to the caller.
By the way, pointer receivers are identical to the situation in Java, although in Java the pointers are hidden under the covers; it's Go's value receivers that are unusual.
Second is the consideration of efficiency. If the receiver is large, a big struct for instance, it will be much cheaper to use a pointer receiver.
(This efficiency point is also illustrated in "Memory, variables in memory, and pointers ")
Next is consistency. If some of the methods of the type must have pointer receivers, the rest should too, so the method set is consistent regardless of how the type is used. See the section on method sets for details.

Related

Rust behavior after move [duplicate]

The Rust language website claims move semantics as one of the features of the language. But I can't see how move semantics is implemented in Rust.
Rust boxes are the only place where move semantics are used.
let x = Box::new(5);
let y: Box<i32> = x; // x is 'moved'
The above Rust code can be written in C++ as
auto x = std::make_unique<int>(5);
auto y = std::move(x); // Note the explicit move
As far as I know (correct me if I'm wrong),
Rust doesn't have constructors at all, let alone move constructors.
No support for rvalue references.
No way to create functions overloads with rvalue parameters.
How does Rust provide move semantics?
I think it's a very common issue when coming from C++. In C++ you are doing everything explicitly when it comes to copying and moving. The language was designed around copying and references. With C++11 the ability to "move" stuff was glued onto that system. Rust on the other hand took a fresh start.
Rust doesn't have constructors at all, let alone move constructors.
You do not need move constructors. Rust moves everything that "does not have a copy constructor", a.k.a. "does not implement the Copy trait".
struct A;
fn test() {
let a = A;
let b = a;
let c = a; // error, a is moved
}
Rust's default constructor is (by convention) simply an associated function called new:
struct A(i32);
impl A {
fn new() -> A {
A(5)
}
}
More complex constructors should have more expressive names. This is the named constructor idiom in C++
No support for rvalue references.
It has always been a requested feature, see RFC issue 998, but most likely you are asking for a different feature: moving stuff to functions:
struct A;
fn move_to(a: A) {
// a is moved into here, you own it now.
}
fn test() {
let a = A;
move_to(a);
let c = a; // error, a is moved
}
No way to create functions overloads with rvalue parameters.
You can do that with traits.
trait Ref {
fn test(&self);
}
trait Move {
fn test(self);
}
struct A;
impl Ref for A {
fn test(&self) {
println!("by ref");
}
}
impl Move for A {
fn test(self) {
println!("by value");
}
}
fn main() {
let a = A;
(&a).test(); // prints "by ref"
a.test(); // prints "by value"
}
Rust's moving and copying semantics are very different from C++. I'm going to take a different approach to explain them than the existing answer.
In C++, copying is an operation that can be arbitrarily complex, due to custom copy constructors. Rust doesn't want custom semantics of simple assignment or argument passing, and so takes a different approach.
First, an assignment or argument passing in Rust is always just a simple memory copy.
let foo = bar; // copies the bytes of bar to the location of foo (might be elided)
function(foo); // copies the bytes of foo to the parameter location (might be elided)
But what if the object controls some resources? Let's say we are dealing with a simple smart pointer, Box.
let b1 = Box::new(42);
let b2 = b1;
At this point, if just the bytes are copied over, wouldn't the destructor (drop in Rust) be called for each object, thus freeing the same pointer twice and causing undefined behavior?
The answer is that Rust moves by default. This means that it copies the bytes to the new location, and the old object is then gone. It is a compile error to access b1 after the second line above. And the destructor is not called for it. The value was moved to b2, and b1 might as well not exist anymore.
This is how move semantics work in Rust. The bytes are copied over, and the old object is gone.
In some discussions about C++'s move semantics, Rust's way was called "destructive move". There have been proposals to add the "move destructor" or something similar to C++ so that it can have the same semantics. But move semantics as they are implemented in C++ don't do this. The old object is left behind, and its destructor is still called. Therefore, you need a move constructor to deal with the custom logic required by the move operation. Moving is just a specialized constructor/assignment operator that is expected to behave in a certain way.
So by default, Rust's assignment moves the object, making the old location invalid. But many types (integers, floating points, shared references) have semantics where copying the bytes is a perfectly valid way of creating a real copy, with no need to ignore the old object. Such types should implement the Copy trait, which can be derived by the compiler automatically.
#[derive(Copy)]
struct JustTwoInts {
one: i32,
two: i32,
}
This signals the compiler that assignment and argument passing do not invalidate the old object:
let j1 = JustTwoInts { one: 1, two: 2 };
let j2 = j1;
println!("Still allowed: {}", j1.one);
Note that trivial copying and the need for destruction are mutually exclusive; a type that is Copy cannot also be Drop.
Now what about when you want to make a copy of something where just copying the bytes isn't enough, e.g. a vector? There is no language feature for this; technically, the type just needs a function that returns a new object that was created the right way. But by convention this is achieved by implementing the Clone trait and its clone function. In fact, the compiler supports automatic derivation of Clone too, where it simply clones every field.
#[Derive(Clone)]
struct JustTwoVecs {
one: Vec<i32>,
two: Vec<i32>,
}
let j1 = JustTwoVecs { one: vec![1], two: vec![2, 2] };
let j2 = j1.clone();
And whenever you derive Copy, you should also derive Clone, because containers like Vec use it internally when they are cloned themselves.
#[derive(Copy, Clone)]
struct JustTwoInts { /* as before */ }
Now, are there any downsides to this? Yes, in fact there is one rather big downside: because moving an object to another memory location is just done by copying bytes, and no custom logic, a type cannot have references into itself. In fact, Rust's lifetime system makes it impossible to construct such types safely.
But in my opinion, the trade-off is worth it.
Rust supports move semantics with features like these:
All types are moveable.
Sending a value somewhere is a move, by default, throughout the language. For non-Copy types, like Vec, the following are all moves in Rust: passing an argument by value, returning a value, assignment, pattern-matching by value.
You don't have std::move in Rust because it's the default. You're really using moves all the time.
Rust knows that moved values must not be used. If you have a value x: String and do channel.send(x), sending the value to another thread, the compiler knows that x has been moved. Trying to use it after the move is a compile-time error, "use of moved value". And you can't move a value if anyone has a reference to it (a dangling pointer).
Rust knows not to call destructors on moved values. Moving a value transfers ownership, including responsibility for cleanup. Types don't have to be able to represent a special "value was moved" state.
Moves are cheap and the performance is predictable. It's basically memcpy. Returning a huge Vec is always fast—you're just copying three words.
The Rust standard library uses and supports moves everywhere. I already mentioned channels, which use move semantics to safely transfer ownership of values across threads. Other nice touches: all types support copy-free std::mem::swap() in Rust; the Into and From standard conversion traits are by-value; Vec and other collections have .drain() and .into_iter() methods so you can smash one data structure, move all the values out of it, and use those values to build a new one.
Rust doesn't have move references, but moves are a powerful and central concept in Rust, providing a lot of the same performance benefits as in C++, and some other benefits as well.
let s = vec!["udon".to_string(), "ramen".to_string(), "soba".to_string()];
this is how it is represented in memory
Then let's assign s to t
let t = s;
this is what happens:
let t = s MOVED the vector’s three header fields from s to t; now t is the owner of the vector. The vector’s elements stayed just
where they were, and nothing happened to the strings either. Every value still has a single owner.
Now s is freed, if I write this
let u = s
I get error: "use of moved value: s"
Rust applies move semantics to almost any use of a value (Except Copy types). Passing
arguments to functions moves ownership to the function’s parameters;
returning a value from a function moves ownership to the caller.
Building a tuple moves the values into the tuple. And so on.
Ref for example:Programming Rust by Jim Blandy, Jason Orendorff, Leonora F. S. Tindall
Primitive types cannot be empty and are fixed size while non primitives can grow and can be empty. since primitive types cannot be empty and are fixed size, therefore assigning memory to store them and handling them are relatively easy. however the handling of non primitives involves the computation of how much memory they will take as they grow and other costly operations.Wwith primitives rust will make a copy, with non primitive rust does a move
fn main(){
// this variable is stored in stack. primitive types are fixed size, we can store them on stack
let x:i32=10;
// s1 is stored in heap. os will assign memory for this. pointer of this memory will be stored inside stack.
// s1 is the owner of memory space in heap which stores "my name"
// if we dont clear this memory, os will have no access to this memory. rust uses ownership to free the memory
let s1=String::from("my name");
// s1 will be cleared from the stack, s2 will be added to the stack poniting the same heap memory location
// making new copy of this string will create extra overhead, so we MOVED the ownership of s1 into s2
let s2=s1;
// s3 is the pointer to s2 which points to heap memory. we Borrowed the ownership
// Borrowing is similar borrowing in real life, you borrow a car from your friend, but its ownership does not change
let s3=&s2;
// this is creating new "my name" in heap and s4 stored as the pointer of this memory location on the heap
let s4=s2.clone()
}
Same principle applies when we pass primitive or non-primitive type arguments to a function:
fn main(){
// since this is primitive stack_function will make copy of it so this will remain unchanged
let stack_num=50;
let mut heap_vec=vec![2,3,4];
// when we pass a stack variable to a function, function will make a copy of that and will use the copy. "move" does not occur here
stack_var_fn(stack_num);
println!("The stack_num inside the main fn did not change:{}",stack_num);
// the owner of heap_vec moved here and when function gets executed, it goes out of scope so the variable will be dropped
// we can pass a reference to reach the value in heap. so we use the pointer of heap_vec
// we use "&"" operator to indicate that we are passing a reference
heap_var_fn(&heap_vec);
println!("the heap_vec inside main is:{:?}",heap_vec);
}
// this fn that we pass an argument stored in stack
fn stack_var_fn(mut var:i32){
// we are changing the arguments value
var=56;
println!("Var inside stack_var_fn is :{}",var);
}
// this fn that we pass an arg that stored in heap
fn heap_var_fn(var:&Vec<i32>){
println!("Var:{:?}",var);
}
I would like to add that it is not necessary for move to memcpy. If the object on the stack is large enough, Rust's compiler may choose to pass the object's pointer instead.
In C++ the default assignment of classes and structs is shallow copy. The values are copied, but not the data referenced by pointers. So modifying one instance changes the referenced data of all copies. The values (f.e. used for administration) remain unchanged in the other instance, likely rendering an inconsistent state. A move semantic avoids this situation. Example for a C++ implementation of a memory managed container with move semantic:
template <typename T>
class object
{
T *p;
public:
object()
{
p=new T;
}
~object()
{
if (p != (T *)0) delete p;
}
template <typename V> //type V is used to allow for conversions between reference and value
object(object<V> &v) //copy constructor with move semantic
{
p = v.p; //move ownership
v.p = (T *)0; //make sure it does not get deleted
}
object &operator=(object<T> &v) //move assignment
{
delete p;
p = v.p;
v.p = (T *)0;
return *this;
}
T &operator*() { return *p; } //reference to object *d
T *operator->() { return p; } //pointer to object data d->
};
Such an object is automatically garbage collected and can be returned from functions to the calling program. It is extremely efficient and does the same as Rust does:
object<somestruct> somefn() //function returning an object
{
object<somestruct> a;
auto b=a; //move semantic; b becomes invalid
return b; //this moves the object to the caller
}
auto c=somefn();
//now c owns the data; memory is freed after leaving the scope

What is the behavior of AtomicPtr::compare_exchange when used on a pointer to a struct?

Does this usage of compare_exchange produce defined behavior?
use std::sync::atomic::{AtomicPtr, Ordering};
struct Dummy {
foo: i64,
bar: i64,
}
fn main() {
let ptr = &mut Dummy { foo: 1, bar: 2 };
let some_ptr = AtomicPtr::new(ptr);
let other_ptr = &mut Dummy { foo: 10, bar: 10 };
let value = some_ptr.compare_exchange(ptr, other_ptr, Ordering::SeqCst, Ordering::Relaxed);
}
If it is defined, what is the defined behavior? Will Rust use double width CAS for the above operation on supported architectures like x86_64?
Does this usage of compare_exchange produce defined behavior?
Barring a bug in the compiler or the standard library, safe Rust should never result in undefined behavior in the C and C++ sense. Additionally, creating and manipulating pointers is always safe, it's only dereferencing them or converting them to references that requires explicit unsafe because the programmer needs to provide guarantees that the compiler cannot verify. Whether a particular call to a safe function does what you expect is another matter, of course.
If it is defined, what is the defined behavior?
The behavior is that specified by the documentation: the address some_ptr points to is compared to the one ptr points to and, if they match, some_ptr is atomically updated to point to the address provided by other_ptr. The previous pointer is returned in either case. Since your code initializes some_ptr from ptr and doesn't create a thread that could change it, it will result in some_ptr pointing to the address provided by other_ptr, and returning ptr.
Will Rust use double width CAS for the above operation on supported architectures like x86_64?
AtomicPtr::compare_exchange() only compares and swaps the pointer, which is as wide as usize (64 bits on modern hardware) and there is no reason for double-width CAS.
But I think I understand how the confusion might have arisen: the docs of compare_exchange speak of value of a pointer. This "value" is not the value of the pointed-to data, it's just the underlying *mut T pointer, which only represents a memory address. Just like an AtomicBool's value is bool, an AtomicPtr value is a pointer.
The docs would be slightly more precise if they spoke of the "address <x> points to" instead of "value of <x>", but that's more verbose. (The shorter "address of <pointer>" would again be ambiguous because it might be understood to refer to the address where the pointer itself is held.)

What are the differences between the multiple ways to create zero-sized structs?

I found four different ways to create a struct with no data:
struct A{} // empty struct / empty braced struct
struct B(); // empty tuple struct
struct C(()); // unit-valued tuple struct
struct D; // unit struct
(I'm leaving arbitrarily nested tuples that contain only ()s and single-variant enum declarations out of the question, as I understand why those shouldn't be used).
What are the differences between these four declarations? Would I use them for specific purposes, or are they interchangeable?
The book and the reference were surprisingly unhelpful. I did find this accepted RFC (clarified_adt_kinds) which goes into the differences a bit, namely that the unit struct also declares a constant value D and that the tuple structs also declare constructors B() and C(_: ()). However it doesn't offer a design guideline on why to use which.
My guess would be that when I export them with pub, there are differences in which kinds can actually be constructed outside of my module, but I found no conclusive documentation about that.
There are only two functional differences between these four definitions (and a fifth possibility I'll mention in a minute):
Syntax (the most obvious). mcarton's answer goes into more detail.
When the struct is marked pub, whether its constructor (also called struct literal syntax) is usable outside the module it's defined in.
The only one of your examples that is not directly constructible from outside the current module is C. If you try to do this, you will get an error:
mod stuff {
pub struct C(());
}
let _c = stuff::C(()); // error[E0603]: tuple struct `C` is private
This happens because the field is not marked pub; if you declare C as pub struct C(pub ()), the error goes away.
There's another possibility you didn't mention that gives a marginally more descriptive error message: a normal struct, with a zero-sized non-pub member.
mod stuff {
pub struct E {
_dummy: (),
}
}
let _e = stuff::E { _dummy: () }; // error[E0451]: field `_dummy` of struct `main::stuff::E` is private
(Again, you can make the _dummy field available outside of the module by declaring it with pub.)
Since E's constructor is only usable inside the stuff module, stuff has exclusive control over when and how values of E are created. Many structs in the standard library take advantage of this, like Box (to take an obvious example). Zero-sized types work in exactly the same way; in fact, from outside the module it's defined in, the only way you would know that an opaque type is zero-sized is by calling mem::size_of.
See also
What is an idiomatic way to create a zero-sized struct that can't be instantiated outside its crate?
Why define a struct with single private field of unit type?
struct D; // unit struct
This is the usual way for people to write a zero-sized struct.
struct A{} // empty struct / empty braced struct
struct B(); // empty tuple struct
These are just special cases of basic struct and tuple struct which happen to have no parameters. RFC 1506 explains the rational to allow those (they didn't used to):
Permit tuple structs and tuple variants with 0 fields. This restriction is artificial and can be lifted trivially. Macro writers dealing with tuple structs/variants will be happy to get rid of this one special case.
As such, they could easily be generated by macros, but people will rarely write those on their own.
struct C(()); // unit-valued tuple struct
This is another special case of tuple struct. In Rust, () is a type just like any other type, so struct C(()); isn't much different from struct E(u32);. While the type itself isn't very useful, forbidding it would make yet another special case that would need to be handled in macros or generics (struct F<T>(T) can of course be instantiated as F<()>).
Note that there are many other ways to have empty types in Rust. Eg. it is possible to have a function return Result<(), !> to indicate that it doesn't produce a value, and cannot fail. While you might think that returning () in that case would be better, you might have to do that if you implement a trait that dictates you to return Result<T, E> but lets you choose T = () and E = !.

What does () mean as an argument in a function where a parameter of type T is expected?

I am new to Rust and I was reading the Dining Philosophers' tutorial when I found this:
Mutex::new(())
I don't know what the argument inside new means. I read the documentation for Mutex and I still have no idea what it means. I would appreciate an explanation about what is happening under the hood.
() is the empty tuple, also called the unit type -- a tuple with no member types. It is also the only valid value of said type. It has a size of zero (note that it is still Sized, just with a size of 0), making it nonexistent at runtime. This has several useful effects, one of which is being used here.
Here, () is used to create a Mutex with no owned data -- it's just an unlockable and lockable mutex. If we explicitly write out the type inference with the turbofish operator ::<>, we could also write:
Mutex::<()>::new( () )
That is, we're creating a new Mutex that contains a () with the initial value ().
() is simply a tuple with no values; a 0-tuple. The type and the value are spelled the same, both (). The type is sometimes known as the "unit type"; it used to actually be a distinct type in the compiler, but now is just treated as a degenerate tuple. It is a 0-sized type; objects of this type won't ever actually take up any space, though it is a Sized type, just with a size of 0.
It is used for cases where you need to have a value or a type, but you have nothing relevant to put there. For instance, if you have a function that doesn't return a value, and call it in a place that expects a value, you find that it actually returns the value () of type ().
fn nothing() {}
fn main() {
println!("{:?}", nothing());
}
That prints () (playpen).
Another use is when you have a generic type like Result<T, E>, which indicates a success or failure of some operation, and can hold either the the result of the successful operation, or an error indicating why it failed. Some operations, such as std::io::write which have no value to return if successful but want to be able to indicate an error, will return a std::io::Result<()>, which is actually a synonym for Result<(), std::io::Error>; that allows the function to return Ok(()) in the success case, but some meaningful error when it fails.
You might compare it to void in C or C++, which are also used for a lack of return value. However, you cannot ever write an object that has type void, which makes void much less useful in generic programming; you could never have an equivalent Result<void, Error> type, because you couldn't ever construct the Ok case.
In this case, a Mutex normally wraps and object that you want to access; so you can put that object into the mutex, and then access it from the guard that you get when you lock the mutex. However, in this example there is no actual data being guarded, so () is used since you need to put something in there, and Mutex is generic over the type so it can accept any type.

Set of structs in Go

If I have a number of structs that I want to store:
type Stuff struct {
a string
b string
}
I can do it with a slice, but it seems like it would use less memory to use a proper set structure.
Unfortunately Go doesn't have a set structure. Everyone recommends using map[Stuff]struct{} but that doesn't work because Stuff is a struct. Anyone have any good solutions? Ideally without having to download a library.
Usually set and map data structures require more memory than storing a list of values in plain array or slice as set and map provide additional features efficiently like uniqueness or value retrieval by key.
If you want minimal memory usage, simply store them in a slice, e.g. []Stuff. If you use the values in multiple places, it might also be profitable to just store their pointers, e.g. []*Stuff and so each places that store the same Stuff values can store the same pointer (without duplicating the value).
If you only want to store unique struct values, then indeed the set would be the most convenient choice, in Go realized with a map.
There's nothing wrong with map[Stuff]struct{}, it works. The requirement for the key type for maps:
The comparison operators == and != must be fully defined for operands of the key type; thus the key type must not be a function, map, or slice.
Stuff is a struct, and structs in Go are comparable if:
Struct values are comparable if all their fields are comparable. Two struct values are equal if their corresponding non-blank fields are equal.
If your Stuff struct is what you posted, it is comparable: it only contains fields of the comparable type string.
Also note that if you want a set data structure, it's clearer if you use bool as the value type (e.g. map[Stuff]bool) and true as the value, and then you can simply use indexing to test if a value is in the map as the index expression yields the zero value of the value type (false for bool) if the key (Stuff in your case) is not in the map, properly telling the value you're looking for is not in the "set". (And if it is in the map, its associated true value is the result of the index expression - properly telling it is in the map).
As icza said, a map of structs is a valid option.
But instead of implementing the Set yourself, I would use one of the new generic implementations that are out there since Go 1.18.
See this one for example: https://github.com/amit7itz/goset
package main
import (
"fmt"
"github.com/amit7itz/goset"
)
type Stuff struct {
a string
b string
}
func main() {
set := goset.NewSet[Stuff]()
set.Add(Stuff{a: "1", b: "2"})
set.Add(Stuff{a: "2", b: "3"})
set.Add(Stuff{a: "2", b: "3"})
fmt.Println(set) // Set[main.Stuff]{{2 3} {1 2}}
fmt.Println(set.Len()) // 2
}

Resources