Immutability: property of variable, value, or API? - programming-languages

I'm working on designing a gradually typed language as a personal project and I'm stuck at a particular design decision regarding immutability.
Talking about this in a language agnostic (and probably crude) way, I say there are two things that could be immutable or constant: the variable binding and the value itself.
The variable binding being constant is like const in Javascript or final in Java, where the variable cannot be reassigned. The value itself being immutable is like const in C++ or Object.freeze() in Javascript.
The question I'm having is, in the case of immutable values (not bindings) what should immutability be a property of?
The value/object itself in Javascript's Object.freeze?
var point = {x: 10};
var frozenPoint = Object.freeze(point);
or part of the type as in C++?
const Point p(10);
or as the variable binding as in Rust?
let p = Point { x: 10 };
// vs let mut p = Point { x: 10 };
or as part of the API of a library? Facebook's Immutable.js, Google's Guava library for Java (ImmutableList class), etc.
I understand that there's probably no "correct" answer for this, so what I'm really looking for is a comparison of the philosophies and motivations for these approaches.

There is a correct answer, but it is very different than what you expect.
The best is to not have mutability at all. In other words : the language should be purely functional. There is no reason to have mutability in a language with garbage collection. Haskell is a proof of this.

Related

To what extent is Rust shadowing zero-cost?

A Zero-Runtime-Cost Mixed List in Rust outlines how to create a heterogenous list in Rust using tuples and normal traits (not trait objects like this question suggests). The list seems to rely heavily on shadowing and effectively changes the entire type of the list every time a new element is added.
The implementation seems brilliant to me, but after reviewing a few Rust's homepage and resources I could not find anyplace that explicitly defines shadowing as zero-cost. As far as I know, repeatedly abandoning data on the stack is less costly than indirection, but repeatedly copying and adding to existing data instead of mutating it sounds pretty expensive.
What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.
Bjarne Stroustrup
Shadowing seems to fulfill the first requirement, but the second?
Is Rust's shadowing actually zero-cost?
Official Rust material tries very hard to never talk about "zero cost" by itself, so you'll have to cite where you see "zero cost" without further qualification. The article states zero runtime cost, so the author of the post is aware of that. In most cases, "zero cost" is used in the context of zero-cost abstractions.
Your Stroustrup quote only partially and obliquely deals with zero-cost abstractions. A better explanation, emphasis mine:
It means paying no penalty for the abstraction, or said otherwise, it means that whether you use the abstraction or instead go for the "manual" implementation you end up having the same costs (same speed, same memory consumption, ...).
Matthieu M.
This means that any time you see "zero-cost abstraction", you have to have something to compare the abstraction against; only then can you tell if it's truly zero-cost.
I don't think that shadowing even counts as an abstraction, but let's pretend it does (and I'll word the rest of my answer as if I believe it is).
Shadowing a variable means having multiple distinct variables with the same name, with the later ones precluding access to the previous ones. The non-"abstract" version of that is having multiple distinct variables of different names. I'd say that having two variables of the same name is the same cost as having two variables of different names, so it is a zero-cost abstraction.
See also:
Why do I need rebinding/shadowing when I can have mutable variable binding?
In Rust, what's the difference between "shadowing" and "mutability"?
Playing the game further, you can ask "is having two variables a zero-cost abstraction?". I'd say that this depends on what the variables are and how they relate.
In this example, I'd say that this is a zero-cost abstraction as there's no more efficient way I could have written the code.
fn example() {
let a = String::new();
let a = a;
}
On the other hand, I'd say that this is not a zero-cost abstraction, as the first a will not be deallocated until the end of the function:
fn example() {
let a = String::new();
let a = String::new();
}
A better way I could choose to write it would be to call drop in the middle. There are good reasons that Rust doesn't do this, but it's not as efficient in regards to memory usage as a hand-written implementation could be.
See also:
Is it possible in Rust to delete an object before the end of scope?
Does Rust free up the memory of overwritten variables?
Is the resource of a shadowed variable binding freed immediately?

Can associated constants be used to initialize the length of fixed size arrays?

In C++, you have the ability to pass integrals inside templates
std::array<int, 3> arr; //fixed size array of 3
I know that Rust has built in support for this, but what if I wanted to create something like linear algebra vector library?
struct Vec<T, size: usize> {
data: [T; size],
}
type Vec3f = Vec<f32, 3>;
type Vec4f = Vec<f32, 4>;
This is currently what I do in D. I have heard that Rust now has Associated Constants.
I haven't used Rust in a long time but this doesn't seem to address this problem at all or have I missed something?
As far as I can see, associated constants are only available in traits and that would mean I would still have to create N vector types by hand.
No, associated constants don't help and aren't intended to. Associated anything are outputs while use cases such as the one in the question want inputs. One could in principle construct something out of type parameters and a trait with associated constants (at least, as soon as you can use associated constants of type parameters — sadly that doesn't work yet). But that has terrible ergonomics, not much better than existing hacks like typenum.
Integer type parameters are highly desired since, as you noticed, they enable numerous things that aren't really feasible in current Rust. People talk about this and plan for it but it's not there yet.
Integer type parameters are not supported as of now, however there's an RFC for that IIRC, and a long-standing discussion.
You could use typenum crate in the meanwhile.

How should I use storage class specifiers like ref, in, out, etc. in function arguments in D?

There are comparatively many storage class specifiers for functions arguments in D, which are:
none
in (which is equivalent to const scope)
out
ref
scope
lazy
const
immutable
shared
inout
What's the rational behind them? Their names already put forth the obvious use. However, there are some open questions:
Should I use ref combined with in for struct type function arguments by default?
Does out imply ref implicitely?
When should I use none?
Does ref on classes and/or interfaces make sense? (Class types are references by default.)
How about ref on array slices?
Should I use const for built-in arithmetic types, whenever possible?
More generally put: When and why should I use which storage class specifier for function argument types in case of built-in types, arrays, structs, classes and interfaces?
(In order to isolate the scope of the question a little bit, please don't discuss shared, since it has its own isolated meaning.)
I wouldn't use either by default. ref parameters only take lvalues, and it implies that you're going to be altering the argument that's being passed in. If you want to avoid copying, then use const ref or auto ref. But const ref still requires an lvalue, so unless you want to duplicate your functions, it's frequently more annoying than it's worth. And while auto ref will avoid copying lvalues (it basically makes it so that there's a version of the function which takes an lvalues by ref and one which takes rvalues without ref), it only works with templates, limiting its usefulness. And using const can have far-reaching consequences due to the fact that D's const is transitive and the fact that it's undefined behavior to cast away const from a variable and modify it. So, while it's often useful, using it by default is likely to get you into trouble.
Using in gives you scope in addition to const, which I'd generally advise against. scope on function parameters is supposed to make it so that no reference to that data can escape the function, but the checks for it aren't properly implemented yet, so you can actually use it in a lot more situations than are supposed to be legal. There are some cases where scope is invaluable (e.g. with delegates, since it makes it so that the compiler doesn't have to allocate a closure for it), but for other types, it can be annoying (e.g. if you pass an array be scope, then you couldn't return a slice to that array from the function). And any structs with any arrays or reference types would be affected. And while you won't get many complaints about incorrectly using scope right now, if you've been using it all over the place, you're bound to get a lot of errors once it's fixed. Also, its utterly pointless for value types, since they have no references to escape. So, using const and in on a value type (including structs which are value types) are effectively identical.
out is the same as ref except that it resets the parameter to its init value so that you always get the same value passed in regardless of what the previous state of the variable being passed in was.
Almost always as far as function arguments go. You use const or scope or whatnot when you have a specific need it, but I wouldn't advise using any of them by default.
Of course it does. ref is separate from the concept of class references. It's a reference to the variable being passed in. If I do
void func(ref MyClass obj)
{
obj = new MyClass(7);
}
auto var = new MyClass(5);
func(var);
then var will refer the newly constructed new MyClass(7) after the call to func rather than the new MyClass(5). You're passing the reference by ref. It's just like how taking the address of a reference (like var) gives you a pointer to a reference and not a pointer to a class object.
MyClass* p = &var; //points to var, _not_ to the object that var refers to.
Same deal as with classes. ref makes the parameter refer to the variable passed in. e.g.
void func(ref int[] arr)
{
arr ~= 5;
}
auto var = [1, 2, 3];
func(var);
assert(var == [1, 2, 3, 5]);
If func didn't take its argument by ref, then var would have been sliced, and appending to arr would not have affected var. But since the parameter was ref, anything done to arr is done to var.
That's totally up to you. Making it const makes it so that you can't mutate it, which means that you're protected from accidentally mutating it if you don't intend to ever mutate it. It might also enable some optimizations, but if you never write to the variable, and it's a built-in arithmetic type, then the compiler knows that it's never altered and the optimizer should be able to do those optimizations anyway (though whether it does or not depends on the compiler's implementation).
immutable and const are effectively identical for the built-in arithmetic types in almost all cases, so personally, I'd just use immutable if I want to guarantee that such a variable doesn't change. In general, using immutable instead of const if you can gives you better optimizations and better guarantees, since it allows the variable to be implicitly shared across threads (if applicable) and it always guarantees that the variable can't be mutated (whereas for reference types, const just means only that that reference can't mutate the object, not that it can't be mutated).
Certainly, if you mark your variables const and immutable as much as possible, then it does help the compiler with optimizations at least some of the time, and it makes it easier to catch bugs where you mutated something when you didn't mean to. It also can make your code easier to understand, since you know that the variable is not going to be mutated. So, using them liberally can be valuable. But again, using const or immutable can be overly restrictive depending on the type (though that isn't a problem with the built-in integral types), so just automatically marking everything as const or immutable can cause problems.

Compile-time constraints for strings in F#, similar to Units of Measure - is it possible?

I'm developing a Web application using F#. Thinking of protecting user input strings from SQL, XSS, and other vulnerabilities.
In two words, I need some compile-time constraints that would allow me discriminate plain strings from those representing SQL, URL, XSS, XHTML, etc.
Many languages have it, e.g. Ruby’s native string-interpolation feature #{...}.
With F#, it seems that Units of Measure do very well, but they are only available for numeric types.
There are several solutions employing runtime UoM (link), however I think it's an overhead for my goal.
I've looked into FSharpPowerPack, and it seems quite possible to come up with something similar for strings:
[<MeasureAnnotatedAbbreviation>] type string<[<Measure>] 'u> = string
// Similarly to Core.LanguagePrimitives.IntrinsicFunctions.retype
[<NoDynamicInvocation>]
let inline retype (x:'T) : 'U = (# "" x : 'U #)
let StringWithMeasure (s: string) : string<'u> = retype s
[<Measure>] type plain
let fromPlain (s: string<plain>) : string =
// of course, this one should be implemented properly
// by invalidating special characters and then assigning a proper UoM
retype s
// Supposedly populated from user input
let userName:string<plain> = StringWithMeasure "John'); DROP TABLE Users; --"
// the following line does not compile
let sql1 = sprintf "SELECT * FROM Users WHERE name='%s';" userName
// the following line compiles fine
let sql2 = sprintf "SELECT * FROM Users WHERE name='%s';" (fromPlain userName)
Note: It's just a sample; don't suggest using SqlParameter. :-)
My questions are: Is there a decent library that does it? Is there any possibility to add syntax sugar?
Thanks.
Update 1: I need compile-time constraints, thanks Daniel.
Update 2: I'm trying to avoid any runtime overhead (tuples, structures, discriminated unions, etc).
A bit late (I'm sure there's a time format where there is only one bit different between February 23rd and November 30th), I believe these one-liners are compatible for your goal:
type string<[<Measure>] 'm> = string * int<'m>
type string<[<Measure>] 'm> = { Value : string }
type string<[<Measure>] 'm>(Value : string) = struct end
In theory it's possible to use 'units' to provide various kinds of compile-time checks on strings (is this string 'tainted' user input, or sanitized? is this filename relative or absolute? ...)
In practice, I've personally not found it to be too practical, as there are so many existing APIs that just use 'string' that you have to exercise a ton of care and manual conversions plumbing data from here to there.
I do think that 'strings' are a huge source of errors, and that type systems that deal with taintedness/canonicalization/etc on strings will be one of the next leaps in static typing for reducing errors, but I think that's like a 15-year horizon. I'd be interested in people trying an approach with F# UoM to see if they get any benefit, though!
The simplest solution to not being able to do
"hello"<unsafe_user_input>
would be to write a type which had some numeric type to wrap the string like
type mystring<'t>(s:string) =
let dummyint = 1<'t>
Then you have a compile time check on your strings
It's hard to tell what you're trying to do. You said you "need some runtime constraints" but you're hoping to solve this with units of measure, which are strictly compile-time. I think the easy solution is to create SafeXXXString classes (where XXX is Sql, Xml, etc.) that validate their input.
type SafeSqlString(sql) =
do
//check `sql` for injection, etc.
//raise exception if validation fails
member __.Sql = sql
It gives you run-time, not compile-time, safety. But it's simple, self-documenting, and doesn't require reading the F# compiler source to make it work.
But, to answer your question, I don't see any way to do this with units of measure. As far as syntactic sugar goes, you might be able to encapsulate it in a monad, but I think it will make it more clunky, not less.
You can use discriminated unions:
type ValidatedString = ValidatedString of string
type SmellyString = SmellyString of string
let validate (SmellyString s) =
if (* ... *) then Some(ValidatedString s) else None
You get a compile-time check, and adding two validated strings won't generate a validated string (which units of measure would allow).
If the added overhead of the reference types is too big, you can use structs instead.

What is a Pointer? [duplicate]

This question already has answers here:
Closed 14 years ago.
See: Understanding Pointers
In many C flavoured languages, and some older languages like Fortran, one can use Pointers.
As someone who has only really programmed in basic, javascript, and actionscript, can you explain to me what a Pointer is, and what it is most useful for?
Thanks!
This wikipedia article will give you detailed information on what a pointer is:
In computer science, a pointer is a programming language data type whose value refers directly to (or "points to") another value stored elsewhere in the computer memory using its address. Obtaining or requesting the value to which a pointer refers is called dereferencing the pointer. A pointer is a simple implementation of the general reference data type (although it is quite different from the facility referred to as a reference in C++). Pointers to data improve performance for repetitive operations such as traversing string and tree structures, and pointers to functions are used for binding methods in Object-oriented programming and run-time linking to dynamic link libraries (DLLs).
A pointer is a variable that contains the address of another variable. This allows you to reference another variable indirectly. For example, in C:
// x is an integer variable
int x = 5;
// xpointer is a variable that references (points to) integer variables
int *xpointer;
// We store the address (& operator) of x into xpointer.
xpointer = &x;
// We use the dereferencing operator (*) to say that we want to work with
// the variable that xpointer references
*xpointer = 7;
if (5 == x) {
// Not true
} else if (7 == x) {
// True since we used xpointer to modify x
}
Pointers are not as hard as they sound. As others have said already, they are variables that hold the address of some other variable. Suppose I wanted to give you directions to my house. I wouldn't give you a picture of my house, or a scale model of my house; I'd just give you the address. You could infer whatever you needed from that.
In the same way, a lot of languages make the distinction between passing by value and passing by reference. Essentially it means will i pass an entire object around every time I need to refer to it? Or, will I just give out it's address so that others can infer what they need?
Most modern languages hide this complexity by figuring out when pointers are useful and optimizing that for you. However, if you know what you're doing, manual pointer management can still be useful in some situations.
There have been several discussions in SO about this topic. You can find information about the topic with the links below. There are several other relevant SO discussions on the subject, but I think that these were the most relevant. Search for 'pointers [C++]' in the search window (or 'pointers [c]') and you will get more information as well.
In C++ I Cannot Grasp Pointers and Classes
What is the difference between modern ‘References’ and traditional ‘Pointers’?
As someone already mention, a pointer is a variable that contains the address of another variable.
It's mostly used when creating new objects (in run-time).

Resources