In Haskell, does mutability always have to be reflected in type system? - haskell

I'm new to Haskell, so please forgive if this question is dumb.
Imagine that we have two data structures bound to the names x and y.
x is mutable.
y is not.
As a matter or principle, does x necessarily have a different type than y?

Short answer: yes.
In Haskell, all variables are technically immutable. Unlike in, say, JavaScript, once you've defined a new variable in Haskell and bound it to an initial value:
let x = expression1 in body1
that's its value forever. Everywhere that x is used in body1, its value is guaranteed to be the (fixed) value of expression1. (Well, technically, you can define a brand new immutable variable x with the same name in body1 and then some of the xs in body might refer to that new variable, but that's a whole different matter.)
Haskell does have mutable data structures. We use them by creating a new mutable structure and binding that to an immutable variable:
do ...
xref <- newIORef (15 :: Int)
...
Here, the value of xref itself is not the integer 15. Rather, xref is assigned an immutable, undisplayable value that identifies a "container" whose contents are currently 15. To use this container, we need to explicitly pull values out of it, which we can assign to immutable variables:
value_of_x_right_now <- readIORef xref
and explicitly put values into it:
writeIORef xref (value_of_x_right_now + 15)
There's obviously much more that can be said about these values, and the way in which they generally need to be accessed via monadic operations in IO or another monad.
But, even setting that aside, it should be clear that the integer 15 and a "container whose contents are initialized to the integer 15" are objects with necessarily different types. In this case, the types are Int and IORef Int respectively.
So, the mutability of data structures will necessarily be reflected at the type level, simply by virtue of the fact that mutability in Haskell is implemented via these sorts of "containers", and the type of a value is not the same as the type of a container containing that value.

Related

How do fixpoint, variable, let and tag schema constructors work in Winery?

I had previously asked where the Winery types are indexed. I noticed that in the serialization for the schema for Bool, which is [4,6], the 4 is the version number, and 6 is the index of SBool in SchemaP. I verified the hypothesis using other "primitive" types like Integer (serialization: 16), Double (18), Text (20). Also, [Bool] will be SVector SBool, serialized to [4,2,6], which makes sense: the 2 is for SVector, the 6 is for SBool.
But SchemaP also contains constructors that I don't intuitively see how are used: SFix, SVar, STag and SLet. What are they, and which type would I need to construct the schema for, to see them used? Why is SLet at the end, but SFix at the beginning?
SFix looks like a µ quantifier for a recursive type. The type µx. T is the type T where x refers to the whole type µx. T. For example, a list data List a = Nil | Cons a (List a) can be represented as L(a) = µr. 1 + a × r, where the recursive occurrence of the type is replaced with the variable r. You could probably see this with a recursive user-defined type like data BinTree a = Leaf | Branch a (BinTree a) (BinTree a).
This encoding doesn’t explicitly include a variable name, because the next constructor SVar specifies that “SVar n refers to the nth innermost fixpoint”, where n is an Int in the synonym type Schema = SchemaP Int. This is a De Bruijn index. If you had some nested recursive types like µx. µy. … = SFix (SFix …), then the inner variable would be referenced as SVar 0 and the outer one as SVar 1 within the body …. This “relative” notation means you can freely reorganise terms without worrying about having to rename variables for capture-avoiding substitution.
SLet is a let binding, and since it’s specified as SLet !(SchemaP a) !(SchemaP a), I presume that SLet e1 e2 is akin to let x = e1 in e2, where the variable is again implicit. So I suspect this may be a deficiency of the docs, and SVar can also refer to Let-bound variables. I don’t know how they use this constructor, but it could be used to make sharing explicit in the schema.
Finally, STag appears to be a way to attach extra “tag” metadata within the schema, in some way that’s specific to the library.
The ordering of these constructors might be maintained for compatibility with earlier versions, so adding new constructors at the end would avoid disturbing the encoding, and I figure the STag and SLet constructors at the end were simply added later.

Understanding recurive let expression in lambda calculus with Haskell, OCaml and nix language

I'm trying to understand how recursive set operate internally by comparing similar feature in another functional programming languages and concepts.
I can find it in wiki. In that, I need to know Y combinator, fixed point. I can get it briefly in wiki.
Then, now I start to apply this in Haskell.
Haskell
It is easy. But I want to know behind the scenes.
*Main> let x = y; y = 10; in x
10
When you write a = f b in a lazy functional language like Haskell or Nix, the meaning is stronger than just assignment. a and f b will be the same thing. This is usually called a binding.
I'll focus on a Nix example, because you're asking about recursive sets specifically.
A simple attribute set
Let's look at the initialization of an attribute set first. When the Nix interpreter is asked to evaluate this file
{ a = 1 + 1; b = true; }
it parses it and returns a data structure like this
{ a = <thunk 1>; b = <thunk 2>; }
where a thunk is a reference to the relevant syntax tree node and a reference to the "environment", which behaves like a dictionary from identifiers to their values, although implemented more efficiently.
Perhaps the reason we're evaluating this file is because you requested nix-build, which will not just ask for the value of a file, but also traverse the attribute set when it sees that it is one. So nix-build will ask for the value of a, which will be computed from its thunk. When the computation is complete, the memory that held the thunk is assigned the actual value, type = tInt, value.integer = 2.
A recursive attribute set
Nix has a special syntax that combines the functionality of attribute set construction syntax ({ }) and let-binding syntax. This is avoids some repetition when you're constructing attribute sets with some shared values.
For example
let b = 1 + 1;
in { b = b; a = b + 5; }
can be expressed as
rec { b = 1 + 1; a = b + 5; }
Evaluation works in a similar manner.
At first the evaluator returns a representation of the attribute set with all thunks, but this time the thunks reference an new environment that includes all the attributes, on top of the existing lexical scope.
Note that all these representations can be constructed while performing a minimal amount of work.
nix-build traverses attrsets in alphabetic order, so it will evaluate a first. It's a thunk that references the a + syntax node and an environment with b in it. Evaluating this requires evaluating the b syntax node (an ExprVar), which references the environment, where we find the 1 + 1 thunk, which is changed to a tInt of 2 as before.
As you can see, this process of creating thunks but only evaluating them when needed is indeed lazy and allows us to have various language constructs with their own scoping rules.
Haskell implementations usually follow a similar pattern, but may compile the code rather than interpret a syntax tree, and resolve all variable references to constant memory offsets completely. Nix tries to do this to some degree, but it must be able to fall back on strings because of the inadvisable with keyword that makes the scope dynamic.
I guess several things by myself.
In eagar evaluation language, I must declare before use it. So the order of declaration is simple.
int x = 10;
int y = x;
Just for Nix language
In wiki, there isn't any concept comparision with Haskell though let ... in is compared with Haskell.
lexical scope
all variables are lexically scoped.
mutual recursion
https://en.wikipedia.org/wiki/Let_expression#Mutually_recursive_let_expression

How does Haskell know whether a data type declaration is a variable or a named type?

Take a data type declaration like
data myType = Null | Container TypeA v
As I understand it, Haskell would read this as myType coming in two different flavors. One of them is Null which Haskell interprets just as some name of a ... I guess you'd call it an instance of the type? Or a subtype? Factor? Level? Anyway, if we changed Null to Nubb it would behave in basically the same way--Haskell doesn't really know anything about null values.
The other flavor is Container and I would expect Haskell to read this as saying that the Container flavor takes two fields, TypeA and v. I expect this is because, when making this type definition, the first word is always read as the name of the flavor and everything that follows is another field.
My question (besides: did I get any of that wrong?) is, how does Haskell know that TypeA is a specific named type rather than an un-typed variable? Am I wrong to assume that it reads v as an un-typed variable, and if that's right, is it because of the lower-case initial letter?
By un-typed I mean how the types appear in the following type-declaration for a function:
func :: a -> a
func a = a
First of all, terminology: "flavors" are called "cases" or "constructors". Your type has two cases - Null and Container.
Second, what you call "untyped" is not really "untyped". That's not the right way to think about it. The a in declaration func :: a -> a does not mean "untyped" the same way variables are "untyped" in JavaScript or Python (though even that is not really true), but rather "whoever calls this function chooses the type". So if I call func "abc", then I have chosen a to be String, and now the compiler knows that the result of this call must also be String, since that's what the func's signature says - "I take any type you choose, and I return the same type". The proper term for this is "generic".
The difference between "untyped" and "generic" is that "untyped" is free-for-all, the type will only be known at runtime, no guarantees whatsoever; whereas generic types, even though not precisely known yet, still have some sort of relationship between them. For example, your func says that it returns the same type it takes, and not something random. Or for another example:
mkList :: a -> [a]
mkList a = [a]
This function says "I take some type that you choose, and I will return a list of that same type - never a list of something else".
Finally, your myType declaration is actually illegal. In Haskell, concrete types have to be Capitalized, while values and type variables are javaCase. So first, you have to change the name of the type to satisfy this:
data MyType = Null | Container TypeA v
If you try to compile this now, you'll still get an error saying that "Type variable v is unknown". See, Haskell has decided that v must be a type variable, and not a concrete type, because it's lower case. That simple.
If you want to use a type variable, you have to declare it somewhere. In function declaration, type variables can just sort of "appear" out of nowhere, and the compiler will consider them "declared". But in a type declaration you have to declare your type variables explicitly, e.g.:
data MyType v = Null | Container TypeA v
This requirement exist to avoid confusion and ambiguity in cases where you have several type variables, or when type variables come from another context, such as a type class instance.
Declared this way, you'll have to specify something in place of v every time you use MyType, for example:
n :: MyType Int
n = Null
mkStringContainer :: TypeA -> String -> MyType String
mkStringContainer ta s = Container ta s
-- Or make the function generic
mkContainer :: TypeA -> a -> MyType a
mkContainer ta a = Container ta a
Haskell uses a critically important distinction between variables and constructors. Variables begin with a lower-case letter; constructors begin with an upper-case letter1.
So data myType = Null | Container TypeA v is actually incorrect; the first symbol after the data keyword is the name of the new type constructor you're introducing, so it must start with a capital letter.
Assuming you've fixed that to data MyType = Null | Container TypeA v, then each of the alternatives separated by | is required to consist of a data constructor name (here you've chosen Null and Container) followed by a type expression for each of the fields of that constructor.
The Null constructor has no fields. The Container constructor has two fields:
TypeA, which starts with a capital letter so it must be a type constructor; therefore the field is of that concrete type.
v, which starts with a lowercase letter and is therefore a type variable. Normally this variable would be defined as a type parameter on the MyType type being defined, like data MyType v = Null | Container TypeA v. You cannot normally use free variables, so this was another error in your original example.2
Your data declaration showed how the distinction between constructors and variables matters at the type level. This distinction between variables and constructors is also present at the value level. It's how the compiler can tell (when you're writing pattern matches) which terms are patterns it should be checking the data against, and which terms are variables that should be bound to whatever the data contains. For example:
lookAtMaybe :: Show a => Maybe a -> String
lookAtMaybe Nothing = "Nothing to see here"
lookAtMaybe (Just x) = "I found: " ++ show x
If Haskell didn't have the first-letter rule, then there would be two possible interpretations of the first clause of the function:
Nothing could be a reference to the externally-defined Nothing constructor, saying I want this function rule to apply when the argument matches that constructor. This is the interpretation the first-letter rule mandates.
Nothing could be a definition of an (unused) variable, representing the function's argument. This would be the equivalent of lookAtMaybe x = "Nothing to see here"
Both of those interpretations are valid Haskell code producing different behaviour (try changing the capital N to a lower case n and see what the function does). So Haskell needs a rule to choose between them. The designers chose the first-letter rule as a way of simply disambiguating constructors from variables (that is simple to both the compiler and to human readers) without requiring any additional syntactic noise.
1 The rule about the case of the first letter applies to alphanumeric names, which can only consist of letters, numbers, and underscores. Haskell also has symbolic names, which consists only of symbol characters like +, *, :, etc. For these, the rule is that names beginning with the : character are constructors, while names beginning with another character are variables. This is how the list constructor : is distinguished from a function name like +.
2 With the ExistentialQuantification extension turned on it is possible to write data MyType = Null | forall v. Container TypeA v, so that the the constructor has a field with a variable type and the variable does not appear as a parameter to the overall type. I'm not going to explain how this works here; it's generally considered an advanced feature, and isn't part of standard Haskell code (which is why it requires an extension)

Pass by Reference in Haskell?

Coming from a C# background, I would say that the ref keyword is very useful in certain situations where changes to a method parameter are desired to directly influence the passed value for value types of for setting a parameter to null.
Also, the out keyword can come in handy when returning a multitude of various logically unconnected values.
My question is: is it possible to pass a parameter to a function by reference in Haskell? If not, what is the direct alternative (if any)?
There is no difference between "pass-by-value" and "pass-by-reference" in languages like Haskell and ML, because it's not possible to assign to a variable in these languages. It's not possible to have "changes to a method parameter" in the first place in influence any passed variable.
It depends on context. Without any context, no, you can't (at least not in the way you mean). With context, you may very well be able to do this if you want. In particular, if you're working in IO or ST, you can use IORef or STRef respectively, as well as mutable arrays, vectors, hash tables, weak hash tables (IO only, I believe), etc. A function can take one or more of these and produce an action that (when executed) will modify the contents of those references.
Another sort of context, StateT, gives the illusion of a mutable "state" value implemented purely. You can use a compound state and pass around lenses into it, simulating references for certain purposes.
My question is: is it possible to pass a parameter to a function by reference in Haskell? If not, what is the direct alternative (if any)?
No, values in Haskell are immutable (well, the do notation can create some illusion of mutability, but it all happens inside a function and is an entirely different topic). If you want to change the value, you will have to return the changed value and let the caller deal with it. For instance, see the random number generating function next that returns the value and the updated RNG.
Also, the out keyword can come in handy when returning a multitude of various logically unconnected values.
Consequently, you can't have out either. If you want to return several entirely disconnected values (at which point you should probably think why are disconnected values being returned from a single function), return a tuple.
No, it's not possible, because Haskell variables are immutable, therefore, the creators of Haskell must have reasoned there's no point of passing a reference that cannot be changed.
consider a Haskell variable:
let x = 37
In order to change this, we need to make a temporary variable, and then set the first variable to the temporary variable (with modifications).
let tripleX = x * 3
let x = tripleX
If Haskell had pass by reference, could we do this?
The answer is no.
Suppose we tried:
tripleVar :: Int -> IO()
tripleVar var = do
let times_3 = var * 3
let var = times_3
The problem with this code is the last line; Although we can imagine the variable being passed by reference, the new variable isn't.
In other words, we're introducing a new local variable with the same name;
Take a look again at the last line:
let var = times_3
Haskell doesn't know that we want to "change" a global variable; since we can't reassign it, we are creating a new variable with the same name on the local scope, thus not changing the reference. :-(
tripleVar :: Int -> IO()
tripleVar var = do
let tripleVar = var
let var = tripleVar * 3
return()
main = do
let x = 4
tripleVar x
print x -- 4 :(

Growing a list in haskell

I'm learning Haskell by writing an OSC musical sequencer to use it with SuperCollider. But because I'd like to make fairly complex stuff with it, it will work like a programming language where you can declare variables and define functions so you can write music in an algorithmic way. The grammar is unusual in that we're coding sequences and sometimes a bar will reference the last bar (something like "play that last chord again but a fifth above").
I don't feel satisfied with my own explanation, but that's the best I can without getting too technical.
Anyway, what I'm coding now is the parser for that language, stateless so far, but now I need some way to implement a growing list of the declared variables and alikes using a dictionary in the [("key","value")] fashion, so I can add new values as I go parsing bar by bar.
I know this involves monads, which I don't really understand yet, but I need something meaningful enough to start toying with them or else I find the raw theory a bit too raw.
So what would be a clean and simple way to start?
Thanks and sorry if the question was too long.
Edit on how the thing works:
we input a string to the main parsing function, say
"afunction(3) ; anotherone(1) + [3,2,1]"
we identify closures first, then kinds of chars (letters, nums, etc) and group them together, so we get a list like:
[("word","afunction"),("parenth","(3)"),("space"," "),("semicolon",";"),("space"," "),("word","anotherone"),("parenth","(1)"),("space"," "),("opadd","+"),("space"," "),("bracket","[3,2,1]")]
then we use a function that tags all those tuples with the indices of the original string they occupy, like:
[("word","afunction",(0,8)),("parenth","(3)",(9,11)),("space"," ",(12,13)) ...]
then cut it in a list of bars, which in my language are separated using a semicolon, and then in notes, using commas.
And now I'm at the stage where those functions should be executed sequentially, but because some of them are reading or modifying previously declared values, I need to keep track of that change. For example, let's say the function f(x) moves the pitch of the last note by x semitones, so
f(9), -- from an original base value of 0 (say that's an A440) we go to 9
f(-2), -- 9-2 = 7, so a fifth from A
f(-3); -- 9-2-3, a minor third down from the last value.
etc
But sometimes it can get a bit more complicated than that, don't make me explain how cause I could bore you to death.
Adding an item to a list
You can make a new list that contains one more item than an existing list with the : constructor.
("key", "value") : existing
Where existing is a list you've already made
Keeping track of changing state
You can keep track of changing state between functions by passing the state from each function to the next. This is all the State monad is doing. State s a is a value of type a that depends on (and changes) a state s.
{- ┌---- type of the state
v v-- type of the value -}
data State s a = State { runState :: s -> (a, s) }
{- ^ ^ ^ ^
a function ---|--┘ | |
that takes a state ---┘ | |
and returns | |
a value that depends on the state ---┘ |
and a new state ------┘ -}
The bind operation >>= for State takes a value that depends on (and changes) the state and a function to compute another value that depends on (and changes) the state and combines them to make a new value that depends on (and changes) the state.
m >>= k = State $ \s ->
let ~(a, s') = runState m s
in runState (k a) s'

Resources