Haskell record syntax - layout

Haskell's record syntax is considered by many to be a wart on an otherwise elegant language, on account of its ugly syntax and namespace pollution. On the other hand it's often more useful than the position based alternative.
Instead of a declaration like this:
data Foo = Foo {
fooID :: Int,
fooName :: String
} deriving (Show)
It seems to me that something along these lines would be more attractive:
data Foo = Foo id :: Int
name :: String
deriving (Show)
I'm sure there must be a good reason I'm missing, but why was the C-like record syntax adopted over a cleaner layout-based approach?
Secondly, is there anything in the pipeline to solve the namespace problem, so we can write id foo instead of fooID foo in future versions of Haskell? (Apart from the longwinded type class based workarounds currently available.)

Well if no one else is going to try, then I'll take another (slightly more carefully researched) stab at answering these questions.
tl;dr
Question 1: That's just the way the dice rolled. It was a circumstantial choice and it stuck.
Question 2: Yes (sorta). Several different parties have certainly been thinking about the issue.
Read on for a very longwinded explanation for each answer, based around links and quotes that I found to be relevant and interesting.
Why was the C-like record syntax adopted over a cleaner layout-based approach?
Microsoft researchers wrote a History of Haskell paper. Section 5.6 talks about records. I'll quote the first tiny bit, which is insightful:
One of the most obvious omissions from early versions of Haskell
was the absence of records, offering named fields. Given that
records are extremely useful in practice, why were they omitted?
The Microsofties then answer their own question
The strongest reason seems to have been that there was no obvious “right” design.
You can read the paper yourself for the details, but they say Haskell eventually adopted record syntax due to "pressure for named fields in data structures".
By the time the Haskell 1.3 design was under way, in 1993, the user
pressure for named fields in data structures was strong, so the committee eventually adopted a minimalist design...
You ask why it is why it is? Well, from what I understand, if the early Haskellers had their way, we might've never had record syntax in the first place. The idea was apparently pushed onto Haskell by people who were already used to C-like syntax, and were more interested in getting C-like things into Haskell rather than doing things "the Haskell way". (Yes, I realize this is an extremely subjective interpretation. I could be dead wrong, but in the absence of better answers, this is the best conclusion I can draw.)
Is there anything in the pipeline to solve the namespace problem?
First of all, not everyone feels it is a problem. A few weeks ago, a Racket enthusiast explained to me (and others) that having different functions with the same name was a bad idea, because it complicates analysis of "what does the function named ___ do?" It is not, in fact, one function, but many. The idea can be extra troublesome for Haskell, since it complicates type inference.
On a slight tangent, the Microsofties have interesting things to say about Haskell's typeclasses:
It was a happy coincidence
of timing that Wadler and Blott happened to produce this key idea
at just the moment when the language design was still in flux.
Don't forget that Haskell was young once. Some decisions were made simply because they were made.
Anyways, there are a few interesting ways that this "problem" could be dealt with:
Type Directed Name Resolution, a proposed modification to Haskell (mentioned in comments above). Just read that page to see that it touches a lot of areas of the language. All in all, it ain't a bad idea. A lot of thought has been put into it so that it won't clash with stuff. However, it will still require significantly more attention to get it into the now-(more-)mature Haskell language.
Another Microsoft paper, OO Haskell, specifically proposes an extension to the Haskell language to support "ad hoc overloading". It's rather complicated, so you'll just have to check out Section 4 for yourself. The gist of it is to automatically (?) infer "Has" types, and to add an additional step to type checking that they call "improvement", vaguely outlined in the selective quotes that follow:
Given the class constraint Has_m (Int -> C -> r) there is
only one instance for m that matches this constraint...Since there is exactly one choice, we should make it now, and that in turn
fixes r to be Int. Hence we get the expected type for f:
f :: C -> Int -> IO Int...[this] is simply a
design choice, and one based on the idea that the class Has_m is closed
Apologies for the incoherent quoting; if that helps you at all, then great, otherwise just go read the paper. It's a complicated (but convincing) idea.
Chris Done has used Template Haskell to provide duck typing in Haskell in a vaguely similar manner to the OO Haskell paper (using "Has" types). A few interactive session samples from his site:
λ> flap ^. donald
*Flap flap flap*
λ> flap ^. chris
I'm flapping my arms!
fly :: (Has Flap duck) => duck -> IO ()
fly duck = do go; go; go where go = flap ^. duck
λ> fly donald
*Flap flap flap*
*Flap flap flap*
*Flap flap flap*
This requires a little boilerplate/unusual syntax, and I personally would prefer to stick to typeclasses. But kudos to Chris Done for freely publishing his down-to-earth work in the area.

I just thought I'd add a link addressing the namespace issue. It seems that overloaded record fields for GHC are coming in GHC 7.10 (and are probably already in HEAD), using the OverloadedRecordFields extension.
This would allow for syntax such as
data Person = Person { id :: Int, name :: String }
data Company { name :: String, employees :: [Person] }
companyNames :: Company -> [String]
companyNames c = name c : map name (employees c)

[edit] This answer is just some random thoughts of mine on the matter. I recommend my other answer over this one, because for that answer I took a lot more time to look up and reference other people's work.
Record syntax
Taking a few stabs in the dark: your "layout-based" proposed syntax looks a lot like non-record-syntax data declarations; that might cause confusion for parsing (?)
--record
data Foo = Foo {i :: Int, s :: String} deriving (Show)
--non-record
data Foo = Foo Int String deriving (Show)
--new-record
data Foo = Foo i :: Int, s :: String deriving (Show)
--record
data LotsaInts = LI {a,b,c,i,j,k :: Int}
--new-record
data LostaInts = LI a,b,c,i,j,k :: Int
In the latter case, what exactly is :: Int applied to? The whole data declaration?
Declarations with the record syntax (currently) are similar to construction and update syntax. Layout-based syntax would not be clearer for these cases; how do you parse those extra = signs?
let f1 = Foo {s = "foo1", i = 1}
let f2 = f1 {s = "foo2"}
let f1 = Foo s = "foo1", i = "foo2"
let f2 = f1 s = "foo2"
How do you know f1 s is a record update, as opposed to a function application?
Namespacing
What if you want to intermingle usage of your class-defined id with the Prelude's id? How do you specify which one you're using? Can you think of any better way than qualified imports and/or the hiding keyword?
import Prelude hiding (id)
data Foo = Foo {a,b,c,i,j,k :: Int, s :: String}
deriving (Show)
id = i
ghci> :l data.hs
ghci> let foo = Foo 1 2 3 4 5 6 "foo"
ghci> id foo
4
ghci> Prelude.id f1
Foo {a = 1, b = 2, c = 3, i = 4, j = 5, k = 6, s = "foo"}
These aren't great answers, but they're the best I've got. I personally don't think record syntax is that ugly. I do feel there is room for improvement with the namespacing/modules stuff, but I have no idea how to make it better.

As of June 2021, it has been half-implemented by three opt-in language extensions and counting:
https://gitlab.haskell.org/ghc/ghc/-/wikis/records/overloaded-record-fields
Even with all three extensions enabled, basic stuff like
len2 :: Point -> Double
len2 p = (x p)^2 + (y p)^2 -- fails!
still won't work if, say, there's a Quaternion type with x and y fields as well. You would have to do this:
len2 :: Point -> Double
len2 p = (x (p :: Point))^2 + (y (p :: Point))^2
or this:
len2 :: Point -> Double
len2 (MkPoint {x = px, y = py}) = px^2 + py^2
Even if the first example did work, it would still be opt-in, so odds are that it will be another two decades before the extension is widely adopted by the libraries that any real application must rely on.
It's ironic when a deal breaker like this is not an issue in a language like C.
One point of interest, though: Idris 2 has actually fixed this. It isn't really ready yet either, though.

Related

understanding data structure in Haskell

I have a problem with a homework (the topic is : "functional data structures").
Please understand that I don't want anyone to solve my homework.
I just have a problem with understanding the structure of this :
data Heap e t = Heap {
empty :: t e,
insert :: e -> t e -> t e,
findMin :: t e -> Maybe e,
deleteMin :: t e -> Maybe (t e),
merge :: t e -> t e -> t e,
contains :: e -> t e -> Maybe Int
}
In my understanding "empty" "insert" and so on are functions which can applied to "Heap"-type data.
Now I just want to understand how that "Heap"thing looks like.
So I was typing things like :
a = Heap 42 42
But I get errors I can't really work with.
Maybe it is a dumb question and I'm just stuck at this point for no reason, but it is killing me at the moment.
Thankful to any help
If you truly wish to understand that type, you need to understand a few requisites first.
types and values (and functions)
Firstly, you need to understand what types and values are. I'm going to assume you understand this. You understand, for example, the separation between "hello" as a value and its type, String and you understand clearly what it means when I say a = "hello" :: String and:
a :: String
a = "hello"
If you don't understand that, then you need to research values and types in Haskell. There are a myriad of books that can help here, such as this one, which I helped to author: http://happylearnhaskelltutorial.com
I'm also going to assume you understand what functions and currying are, and how to use both of them.
polymorphic types
Secondly, as your example contains type variables, you'll need to understand what they are. That is, you need to understand what polymoprhic types are. So, for example, Maybe a, or Either a b, and you'll need to understand how Maybe String is different to Maybe Int and what Num a => [a] and even things like what Num a => [Maybe a] is.
Again, there are many free or paid books that can help, the example above covers this, too.
algebraic data types
Next up is algebraic data types. This is a pretty amazingly cool feature that Haskell has. Haskell-like languages such as Elm and Idris have it as well as others like Rust, too. It lets you define your own data types. These aren't just things like Structs in other languages, and yeah, they can even contain functions.
Maybe is actually an example of an algebraic data types. If you understand these, you'll know that:
data Direction = North | South | East | West
defines a data type called Direction whose values can only be one of North, South, East or West, and you'll know that you can also use the polymorhpic type variables above to parameterise your types like so:
data Tree a = EmptyNode | Node (Tree a) (Tree a)
which uses both optionality (as in the sum type of Direction above) as well as parameterization.
In addition to this, you can also have multiple types in each value. These are called product types, and Haskell's algebraic datatypes can be expressed as a combination of Sum types that can contain Product types. For example:
type Location = (Float, Float)
data ShapeNode = StringNode Location String | CircleNode Location Float | SquareNode Location Float Float
That is, each value can be one of StringNode, CircleNode or SquareNode, and in each case there are a different set of fields given to each value. To create a StringNode, for example, you'd need to pass the values of it constructor like this: StringNode (10.0, 5.3) "A String".
Again, the freely available books will go into much more detail about these things, but we're moving in the direction of getting more than a basic understanding of Haskell now.
Finally, in order to fully understand your example, you'll need to know about...
record types
Record types are the same as product types above, except that the fields are labelled rather than being anonymous. So, you could define the shape node data type like this, instead:
type Location = (Float, Float)
data ShapeNode
= StringNode { stringLocation :: Location, stringData :: String }
| CircleNode { circleLocation :: Location, radius :: Float }
| SquareNode { squareLocation :: Location, length :: Float, height :: Float }
Each field is named, and you can't repeat the same name inside data values.
All that you need in addition to this to understand the above example is to realise your example contains all of these things together, along with the fact that you have functions as your record field values in the data type you have.
It's a good idea to thoroughly flesh out your understanding and not skip any steps, then you'll be able to follow these kinds of things much more easily in the future. :) I wish you luck!
Heap is a record with six elements. In order to create a value of that type, you must supply all six elements. Assuming that you have appropriate values and functions, you can create a value like this:
myHeap = Heap myEmpty myInsert myFindMin myDeleteMin myMerge myContains
The doesn't seem like idiomatic Haskell design, however. Why not define generic functions independent of the data, or, if they must be bundled together, a typeclass?

Redundancy regarding product types and tuples in Haskell

In Haskell you have product types and you have tuples.
You use tuples if you don't want to associate a dedicated type with the value, and you can use product types if you wish to do so.
However I feel there is redundancy in the notation of product types
data Foo = Foo (String, Int, Char)
data Bar = Bar String Int Char
Why are there both kinds of notations? Is there any case where you would prefer one the other?
I guess you can't use record notation when using tuples, but that's just a convenience problem. Another thing might be the notion of order in tuples, as opposed to product types, but I think that's just due to the naming of the functions fst and snd.
#chi's answer is about the technical differences in terms of Haskell's evaluation model. I hope to give you some insight into the philosophy of this sort of typed programming.
In category theory we generally work with objects "up to isomorphism". Your Bar is of course isomorphic to (String, Int, Char), so from a categorical perspective they're the same thing.
bar_tuple :: Iso' Bar (String, Int, Char)
bar_tuple = iso to from
where to (Bar s i c) = (s, i, c)
from (s, i, c) = Bar s i c
In some sense tuples are a Platonic form of product type, in that they have no meaning beyond being a collection of disparate values. All the other product types can be mapped to and from a plain old tuple.
So why not use tuples everywhere, when all Haskell types ultimately boil down to a sum of products? It's about communication. As Martin Fowler says,
Any fool can write code that a computer can understand. Good programmers write code that humans can understand.
Names are important! Writing down a custom product type like
data Customer = Customer { name :: String, address :: String }
imbues the type Customer with meaning to the person reading the code, unlike (String, String) which just means "two strings".
Custom types are particularly useful when you want to enforce invariants by hiding the representation of your data and using smart constructors:
newtype NonEmpty a = NonEmpty [a]
nonEmpty :: [a] -> Maybe (NonEmpty a)
nonEmpty [] = Nothing
nonEmpty xs = Just (NonEmpty xs)
Now, if you don't export the NonEmpty constructor, you can force people to go through the nonEmpty smart constructor. If someone hands you a NonEmpty value you may safely assume that it has at least one element.
You can of course represent Customer as a tuple under the hood and expose evocatively-named field accessors,
newtype Customer = Bar (String, String)
name, address :: Customer -> String
name (Customer (n, a)) = n
address (Customer (n, a)) = a
but this doesn't really buy you much, except that it's now cheaper to convert Customer to a tuple (if, say, you're writing performance-sensitive code that works with a tuple-oriented API).
If your code is intended to solve a particular problem - which of course is the whole point of writing code - it pays to not just solve the problem, but make it look like you've solved it too. Someone - maybe you in a couple of years - is going to have to read this code and understand it with no a priori knowledge of how it works. Custom types are a very important communication tool in this regard.
The type
data Foo = Foo (String, Int, Char)
represents a double-lifted tuple. It values comprise
undefined
Foo undefined
Foo (undefined, undefined, undefined)
etc.
This is usually troublesome. Because of this, it's rare to see such definitions in actual code. We either have plain data types
data Foo = Foo String Int Char
or newtypes
newtype Foo = Foo (String, Int, Char)
The newtype can be just as inconvenient to use, but at least it
does not double-lift the tuple: undefined and Foo undefined are now equal values.
The newtype also provides zero-cost conversion between a plain tuple and Foo, in both directions.
You can see such newtypes in use e.g. when the programmer needs a different instance for some type class, than the one already associated with the tuple. Or, perhaps, it is used in a "smart constructor" idiom.
I would not expect the pattern used in Foo to be frequent. There is slight difference in what the constructor acts like: Foo :: (String, Int, Char) -> Foo as opposed to Bar :: String -> Int -> Char -> Bar. Then Foo undefined and Foo (undefined, ..., ...) are strictly speaking different things, whereas you miss one level of undefinedness in Bar.

Using different Ordering for Sets

I was reading a Chapter 2 of Purely Functional Data Structures, which talks about unordered sets implemented as binary search trees. The code is written in ML, and ends up showing a signature ORDERED and a functor UnbalancedSet(Element: ORDERED): SET. Coming from more of a C++ background, this makes sense to me; custom comparison function objects form part of the type and can be passed in at construction time, and this seems fairly analogous to the ML functor way of doing things.
When it comes to Haskell, it seems the behavior depends only on the Ord instance, so if I wanted to have a set that had its order reversed, it seems like I'd have to use a newtype instance, e.g.
newtype ReverseInt = ReverseInt Int deriving (Eq, Show)
instance Ord ReverseInt where
compare (ReverseInt a) (ReverseInt b)
| a == b = EQ
| a < b = GT
| a > b = LT
which I could then use in a set:
let x = Set.fromList $ map ReverseInt [1..5]
Is there any better way of doing this sort of thing that doesn't resort to using newtype to create a different Ord instance?
No, this is really the way to go. Yes, having a newtype is sometimes annoying but you get some big benefits:
When you see a Set a and you know a, you immediately know what type of comparison it uses (sort of the same way that purity makes code more readable by not making you have to trace execution). You don't have to know where that Set a comes from.
For many cases, you can coerce your way through multiple newtypes at once. For example, I can turn xs = [1,2,3] :: Int into ys = [ReverseInt 1, ReverseInt 2, ReverseInt 3] :: [ReverseInt] just using ys = coerce xs :: [ReverseInt]. Unfortunately, that isn't the case for Set (and it shouldn't - you'd need the coercion function to be monotonic to not screw up the data structure invariants, and there is not yet a way to express that in the type system).
newtypes end up being more composable than you expect. For example, the ReverseInt type you made already exists in a form that generalizes to reversing any type with an Ord constraint: it is called Down. To be explicit, you could use Down Int instead of ReversedInt, and you get the instance you wrote out for free!
Of course, if you still feel very strongly about this, nothing is stopping you from writing your version of Set which has to have a field which is the comparison function it uses. Something like
data Set a = Set { comparisionKey :: a -> a -> Ordering
, ...
}
Then, every time you make a Set, you would have to pass in the comparison key.

Why aren't there existentially quantified type variables in GHC Haskell

There are universally quantified type variables, and there are existentially quantified data types. However, despite that people give pseudocode of the form exists a. Int -> a to help explain concepts sometimes, it doesn't seem like a compiler extension that there's any real interest in. Is this just a "there isn't much value in adding this" kind of thing (because it does seem valuable to me), or is there a problem like undecidability that's makes it truly impossible.
EDIT:
I've marked viorior's answer as correct because it seems like it is probably the actual reason why this was not included. I'd like to add some additional commentary though just in case anyone would want to help clarify this more.
As requested in the comments, I'll give an example of why I would consider this useful. Suppose we have a data type as follows:
data Person a = Person
{ age: Int
, height: Double
, weight: Int
, name: a
}
So we choose parameterize over a, which is a naming convention (I know that it probably makes more sense in this example to make a NamingConvention ADT with appropriate data constructors for the American "first,middle,last", the hispanic "name,paternal name,maternal name", etc. But for now, just go with this).
So, there are several functions we see that basically ignore the type that Person is parameterized over. Examples would be
age :: Person a -> Int
height :: Person a -> Double
weight :: Person a -> Int
And any function built on top of these could similarly ignore the a type. For example:
atRiskForDiabetes :: Person a -> Bool
atRiskForDiabetes p = age p + weight p > 200
--Clearly, I am not actually a doctor
Now, if we have a heterogeneous list of people (of type [exists a. Person a]), we would like to be able to map some of our functions over the list. Of course, there are some useless ways to map:
heteroList :: [exists a. Person a]
heteroList = [Person 20 30.0 170 "Bob Jones", Person 50 32.0 140 3451115332]
extractedNames = map name heteroList
In this example, extractedNames is of course useless because it has type [exists a. a]. However, if we use our other functions:
totalWeight :: [exists a. Person a] -> Int
totalWeight = sum . map age
numberAtRisk :: [exists a. Person a] -> Int
numberAtRisk = length . filter id . map atRiskForDiabetes
Now, we have something useful that operates over a heterogeneous collection (And, we didn't even involve typeclasses). Notice that we were able to reuse our existing functions. Using an existential data type would go as follows:
data SomePerson = forall a. SomePerson (Person a) --fixed, thanks viorior
But now, how can we use age and atRiskForDiabetes? We can't. I think that you would have to do something like this:
someAge :: SomePerson -> Int
someAge (SomePerson p) = age p
Which is really lame because you have to rewrite all of your combinators for a new type. It gets even worse if you want to do this with a data type that's parameterized over several type variables. Imagine this:
somewhatHeteroPipeList :: forall a b. [exists c d. Pipe a b c d]
I won't explain this line of thought any further, but just notice that you'd be rewriting a lot of combinators to do anything like this using just existential data types.
That being said, I hope I've give a mildly convincing use that this could be useful. If it doesn't seem useful (or if the example seems too contrived), feel free to let me know. Also, since I am firstly a programmer and have no training in type theory, it's a little difficult for me to see how to use Skolem's theorum (as posted by viorior) here. If anyone could show me how to apply it to the Person a example I gave, I would be very grateful. Thanks.
It is unnecessary.
By Skolem's Theorem we could convert existential quantifier into universal quantifier with higher rank types:
(∃b. F(b)) -> Int <===> ∀b. (F(b) -> Int)
Every existentially quantified type of rank n+1 can be encoded as a universally quantified type of rank n
Existentially quantified types are available in GHC, so the question is predicated on a false assumption.

Self-reference in data structure – Checking for equality

In my initial attempt at creating a disjoint set data structure I created a Point data type with a parent pointer to another Point:
data Point a = Point
{ _value :: a
, _parent :: Point a
, _rank :: Int
}
To create a singleton set, a Point is created that has itself as its parent (I believe this is called tying the knot):
makeSet' :: a -> Point a
makeSet' x = let p = Point x p 0 in p
Now, when I wanted to write findSet (i.e. follow parent pointers until you find a Point whose parent is itself) I hit a problem: Is it possible to check if this is the case? A naïve Eq instance would of course loop infinitely – but is this check conceptually possible to write?
(I ended up using a Maybe Point for the parent field, see my other question.)
No, what you are asking for is known in the Haskell world as referential identity: the idea that for two values of a certain type, you can check whether they are the same value in memory or two separate values that happen to have exactly the same properties.
For your example, you can ask yourself whether you would consider the following two values the same or not:
pl1 :: Point Int
pl1 = Point 0 (Point 0 pl1 1) 1
pl2 :: Point Int
pl2 = Point 0 pl2 1
Haskell considers both values completely equal. I.e. Haskell does not support referential identity. One of the reasons for this is that it would violate other features that Haskell does support. It is for example the case in Haskell that we can always replace a reference to a function by that function's implementation without changing the meaning (equational reasoning). For example if we take the implementation of pl2: Point 0 pl2 1 and replace pl2 by its definition, we get Point 0 (Point 0 pl2 1) 1, making pl2's definition equivalent to pl1's. This shows that Haskell cannot allow you to observe the difference between pl1 and pl2 without violating properties implied by equational reasoning.
You could use unsafe features like unsafePerformIO (as suggested above) to work around the lack of referential identity in Haskell, but you should know that you are then breaking core principles of Haskell and you may observe weird bugs when GHC starts optimizing (e.g. inlining) your code. It is better to use a different representation of your data, e.g. the one you mentioned using a Maybe Point value.
You could try to do this using StableName (or StablePtr) and unsafePerformIO, but it seems like a worse idea than Maybe Point for this case.
In order the observed the effect you're most likely interested in—pointer equality instead of value equality—you most likely will want to write your algorithm in the ST monad. The ST monad can be thought of kind of like "locally impure IO, globally pure API" though, by the nature of Union Find, you'll likely have to dip the entire lookup process into the impure section of your code.
Fortunately, monads still contain this impurity fairly well.
There is also already an implementation of Union Find in Haskell which uses three methods to achieve the kind of identity you're looking for: using the IO monad, using IntSets, and using the ST monad.
In Haskell data is immutable (except IO a, IORef a, ST a, STRef a, ..) and data is function.
So, any data is "singleton"
When you type
data C = C {a, b :: Int}
changeCa :: C -> Int -> C
changeCa c newa = c {a = newa}
You don't change a variable, you destroy old data and create a new one.
You could try use links and pointers, but it is useless and complex
And last,
data Point a = Point {p :: Point a}
is a infinite list-like data = Point { p=Point { p=Point { p=Point { ....

Resources