Why Rust allows declaring same variable name twice in a scope? [duplicate] - rust

This question already has an answer here:
What is the rationale behind allowing variable shadowing in Rust? [closed]
(1 answer)
Closed 2 years ago.
First time I am encountering a typed language allowing to declare a variable name twice in the same scope. Wouldn't there be a chance to override an existing variable by mistake? What advantage does it bring?

There is a chapter in the book about this.
Shadowing is different from marking a variable as mut, because we’ll get a compile-time error if we accidentally try to reassign to this variable without using the let keyword. By using let, we can perform a few transformations on a value but have the variable be immutable after those transformations have been completed.
The other difference between mut and shadowing is that because we’re effectively creating a new variable when we use the let keyword again, we can change the type of the value but reuse the same name. For example, say our program asks a user to show how many spaces they want between some text by inputting space characters, but we really want to store that input as a number
let spaces = " "; // String
let spaces = spaces.len(); // number
In short, it allows you to "modify" a value, in a way that is technically immutable. Rust ensures that you cannot use the shadowed variable, so it's perfectly typesafe.
I'm no Rust expert, but from a language design perspective it's an interesting thing to encourage. But I think the point is to discourage the use of mutable values whenever possible by allowing you to immutably override a name with a new type and value.

Related

Rust backpointers [duplicate]

This question already has answers here:
Why can't I store a value and a reference to that value in the same struct?
(4 answers)
Shared circular references in Rust
(1 answer)
Closed 3 years ago.
I am learning Rust from a C++/Java background, and I have the following pattern
struct Node<'a> {
network_manager: NetworkManager<'a>,
}
struct NetworkManager<'a> {
base_node: &'a Node<'a>,
}
The node contains the threadpool that the NetworkManager uses to "handoff" messages once they've been processed. Because of the recursive call, it is not possible to set the base_node field in the NetworkManager immediately. In Java, I would leave it as null and have a second method that is called after the constructor called initialise(BaseNode node) that would set the base_node field (ensuring that there are no calls to the network manager before initialise is called).
What is the idiomatic way of doing this in Rust? The only way I can think of is to make base_node an Option type, but this seems suboptimal.
In general, what is the "right" way in Rust to deal with situations where A points to B and B points to A, and where (as in my case), refactoring is not possible?
From my experience, these situations are very different from other languages. In "safe, simple, everyday Rust" having backpointers/pointers within the struct is complex since it leads to non-trivial problems. (Consider what would happen if you would move Node around in memory: How would you properly update the backpointer in NetworkManager?)
What I usually resort to is simply passing base_node as a parameter to the functions that need the backpointer. This is sometimes easier said than done, but leads to code that clearly states ownership.

What's the practical use of Option in rust? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Consider this example
fn main() {
let mut i: Option<i32> = None;
//after some processing it got some value of 55
i = Some(55);
println!("value is {:?}", i.unwrap());
}
In go, nil represents the zero-value of that type.
However in rust, it represents absence of a value. How is absence of a value useful in practice?
When a variable with a type is declared, it must have some value either initialized or un-initialized. Why will one declare it to have it absent?
Also please explain, at what point the memory is allocated for i during the initial declaration or when i gets some value?
I might be asking a stupid question, but want to get my head around the need of this concept.
How is absence of a value useful in practice?
A simple example is a function that looks for the first matching element in a collection. It may find it, and return it, or not find any.
The docs give a few more cases:
Initial values
Return values for functions that are not defined over their entire input range (partial functions)
Return value for otherwise reporting simple errors, where None is returned on error
Optional struct fields
Struct fields that can be loaned or "taken"
Optional function arguments
Nullable pointers
Swapping things out of difficult situations
Now, you may ask: why don't we use one of the values to mark an empty one? For two reasons:
There are cases where you do not have a valid "zero-value" or a valid "invalid" value. In this case, you have to use some flag somewhere else to store the fact that something is invalid.
In general, it is simpler to use the same solution everywhere than having to mark and document which is the "none" value.
Why will one declare it to have it absent?
This is different than initialized/uninitialized values. Option is simply a type that contains either "nothing" (None) or a "value" of some type (Some(value))
You can conceptually see it as a struct with a flag and some space for the value itself.
Also please explain, at what point the memory is allocated for i during the initial declaration or when i gets some value?
That depends on the implementation. One could decide to implement Option using a pointer to the value, which means it could delay allocating.
However, the most likely implementation is avoiding pointers and keeping the value plus an extra flag. Note that, for some types, you can also optimize further and avoid the flag altogether. For instance, if you have an Option of a pointer, you can simply use the zero value for None. In fact, Rust does such a thing for types like Option<Box<T>>.

Is it always preferable to pass in a mutable reference vs creating and returning an owned value?

Coming to Rust from dynamic languages like Python, I'm not used to the programming pattern where you provide a function with a mutable reference to an empty data structure and that function populates it. A typical example is reading a file into a String:
let mut f = File::open("file.txt").unwrap();
let mut contents = String::new();
f.read_to_string(&mut contents).unwrap();
To my Python-accustomed eyes, an API where you just create an owned value within the function and move it out as a return value looks much more intuitive / ergonomic / what have you:
let mut f = File::open("file.txt").unwrap();
let contents = f.read_to_string().unwrap();
Since the Rust standard library takes the former road, I figure there must be a reason for that.
Is it always preferable to use the reference pattern? If so, why? (Performance reasons? What specifically?) If not, how do I spot the cases where it might be beneficial? Is it mostly useful when I want to return another value in addition to populating the result data structure (as in the first example above, where .read_to_string() returns the number of bytes read)? Why not use a tuple? Is it simply a matter of personal preference?
If read_to_string wanted to return an owned String, this means it would have to heap allocate a new String every time it was called. Also, because Read implementations don't always know how much data there is to be read, it would probably have to incrementally re-allocate the work-in-progress String multiple times. This also means every temporary String has to go back to the allocator to be destroyed.
This is wasteful. Rust is a system programming language. System programming languages abhor waste.
Instead, the caller is responsible for allocating and providing the buffer. If you only call read_to_string once, nothing changes. If you call it more than once, however, you can re-use the same buffer multiple times without the constant allocate/resize/deallocate cycle. Although it doesn't apply in this specific case, similar interfaces can be design to also support stack buffers, meaning in some cases you can avoid heap activity entirely.
Having the caller pass the buffer in is strictly more flexible than the alternative.

How should I use storage class specifiers like ref, in, out, etc. in function arguments in D?

There are comparatively many storage class specifiers for functions arguments in D, which are:
none
in (which is equivalent to const scope)
out
ref
scope
lazy
const
immutable
shared
inout
What's the rational behind them? Their names already put forth the obvious use. However, there are some open questions:
Should I use ref combined with in for struct type function arguments by default?
Does out imply ref implicitely?
When should I use none?
Does ref on classes and/or interfaces make sense? (Class types are references by default.)
How about ref on array slices?
Should I use const for built-in arithmetic types, whenever possible?
More generally put: When and why should I use which storage class specifier for function argument types in case of built-in types, arrays, structs, classes and interfaces?
(In order to isolate the scope of the question a little bit, please don't discuss shared, since it has its own isolated meaning.)
I wouldn't use either by default. ref parameters only take lvalues, and it implies that you're going to be altering the argument that's being passed in. If you want to avoid copying, then use const ref or auto ref. But const ref still requires an lvalue, so unless you want to duplicate your functions, it's frequently more annoying than it's worth. And while auto ref will avoid copying lvalues (it basically makes it so that there's a version of the function which takes an lvalues by ref and one which takes rvalues without ref), it only works with templates, limiting its usefulness. And using const can have far-reaching consequences due to the fact that D's const is transitive and the fact that it's undefined behavior to cast away const from a variable and modify it. So, while it's often useful, using it by default is likely to get you into trouble.
Using in gives you scope in addition to const, which I'd generally advise against. scope on function parameters is supposed to make it so that no reference to that data can escape the function, but the checks for it aren't properly implemented yet, so you can actually use it in a lot more situations than are supposed to be legal. There are some cases where scope is invaluable (e.g. with delegates, since it makes it so that the compiler doesn't have to allocate a closure for it), but for other types, it can be annoying (e.g. if you pass an array be scope, then you couldn't return a slice to that array from the function). And any structs with any arrays or reference types would be affected. And while you won't get many complaints about incorrectly using scope right now, if you've been using it all over the place, you're bound to get a lot of errors once it's fixed. Also, its utterly pointless for value types, since they have no references to escape. So, using const and in on a value type (including structs which are value types) are effectively identical.
out is the same as ref except that it resets the parameter to its init value so that you always get the same value passed in regardless of what the previous state of the variable being passed in was.
Almost always as far as function arguments go. You use const or scope or whatnot when you have a specific need it, but I wouldn't advise using any of them by default.
Of course it does. ref is separate from the concept of class references. It's a reference to the variable being passed in. If I do
void func(ref MyClass obj)
{
obj = new MyClass(7);
}
auto var = new MyClass(5);
func(var);
then var will refer the newly constructed new MyClass(7) after the call to func rather than the new MyClass(5). You're passing the reference by ref. It's just like how taking the address of a reference (like var) gives you a pointer to a reference and not a pointer to a class object.
MyClass* p = &var; //points to var, _not_ to the object that var refers to.
Same deal as with classes. ref makes the parameter refer to the variable passed in. e.g.
void func(ref int[] arr)
{
arr ~= 5;
}
auto var = [1, 2, 3];
func(var);
assert(var == [1, 2, 3, 5]);
If func didn't take its argument by ref, then var would have been sliced, and appending to arr would not have affected var. But since the parameter was ref, anything done to arr is done to var.
That's totally up to you. Making it const makes it so that you can't mutate it, which means that you're protected from accidentally mutating it if you don't intend to ever mutate it. It might also enable some optimizations, but if you never write to the variable, and it's a built-in arithmetic type, then the compiler knows that it's never altered and the optimizer should be able to do those optimizations anyway (though whether it does or not depends on the compiler's implementation).
immutable and const are effectively identical for the built-in arithmetic types in almost all cases, so personally, I'd just use immutable if I want to guarantee that such a variable doesn't change. In general, using immutable instead of const if you can gives you better optimizations and better guarantees, since it allows the variable to be implicitly shared across threads (if applicable) and it always guarantees that the variable can't be mutated (whereas for reference types, const just means only that that reference can't mutate the object, not that it can't be mutated).
Certainly, if you mark your variables const and immutable as much as possible, then it does help the compiler with optimizations at least some of the time, and it makes it easier to catch bugs where you mutated something when you didn't mean to. It also can make your code easier to understand, since you know that the variable is not going to be mutated. So, using them liberally can be valuable. But again, using const or immutable can be overly restrictive depending on the type (though that isn't a problem with the built-in integral types), so just automatically marking everything as const or immutable can cause problems.

Compiler: How to implement Reference Counting (in a simple VM)

Ive written a very simple Compiler that translates my source language to bytecode, this code gets processed by the VM (as a simple stack machine, so 3 + 3 will get translated into
push 3
push 3
add
right now I struggle at the garbage collection (I want to use reference counting).
I know the basic concept of it, if a reference gets assigned, the reference counter of that object is incremented, and if it leaves scope, it gets decremented, but the thing thats not clear to me is how the GC can free objects that get passed to functions...
here some more concrete examples of what i mean
string a = "im a string" //ok, assignment, refcount + 1 at declare time and - 1 when it leaves scope
print(new Object()) //how is a parameter solved? is the reference incremented before calling the function?
string b = "a" + "b" + "c" //dont know how to solve this, because 2 strings get pushed, then concanated, then the last gets pushed and concanated again, but should the push operation increase the ref count too or what, and where to decrease them then?
I would be glad if anyone could give me links to tutorials for implementing reference counting or help me with this very specific problem if someone had this problem before (my problem is that i dont understand when to inc, dec the references or where the count is stored)
I think a couple of things can happen with literals. You can treat them like literal numbers, and they are constants and there forever, or you can have an implicit variable that has retrain count of 1 before print, and releases it after.
In response to your edit:
You can use the implicit variable solution, or you can use the "autorelease" concept from Objective-C. You have a an object that is placed in the autorelease pool that will be released in a small amount of time, in which the receiver of the object can retain it.
First, what types of objects does your language allow to be put on the heap? Strings? Do you have mutable or immutable strings?
Check out this post about Strings in Java. So in a Java like language strings get copied every time you concatenate them because they are immutable. Also "this is a string" is actually a call to the constructor of the string class.
If the argument to print() is a call to a constructor (new Object()), there is no reference to the object in the scope calling the function, thus the object lives in the scope of the function and the counters should be incremented and decremented accordingly to entering and leaving the scope of the print() function. If the constructor is called in the calling scope and assigned to a variable, it lives in the calling scope.
While reading about the stuff, Wikipedia is a good start, but Andrew Appel's compiler book would be handy to have (there should be a 2nd edition out there and there is a C and ML version of the book available too). Lambda-the-Ultimate is the place where many of the programming language researchers discuss things, so definitely a place worth looking at.

Resources