Why doesn't Haskell/GHC support record name overloading - haskell

I am a Haskell newbie. I have noticed that Haskell does not support record name overloading:
-- Records.hs
data Employee = Employee
{ firstName :: String
, lastName :: String
, ssn :: String
} deriving (Show, Eq)
data Manager = Manager
{ firstName :: String
, lastName :: String
, ssn :: String
, subordinates :: [Employee]
} deriving (Show, Eq)
When I compile this I get:
[1 of 1] Compiling Main ( Records.hs, Records.o )
Records.hs:10:5:
Multiple declarations of `firstName'
Declared at: Records.hs:4:5
Records.hs:10:5
Records.hs:11:5:
Multiple declarations of `lastName'
Declared at: Records.hs:5:5
Records.hs:11:5
Records.hs:12:5:
Multiple declarations of `ssn'
Declared at: Records.hs:6:5
Records.hs:12:5
Given the "strength" of the Haskell type system, it seems like it should be easy for the compiler to determine which field to access in
emp = Employee "Joe" "Smith" "111-22-3333"
man = Manager "Mary" "Jones" "333-22-1111" [emp]
firstName man
firstName emp
Is there some issue that I am not seeing. I know that the Haskell Report does not allow this, but why not?

Historical reasons. There have been many competing designs for better record systems for Haskell -- so many in fact, that no consensus could be reached. Yet.

The current record system is not very sophisticated. It's mostly some syntactic sugar for things you could do with boilerplate if there was no record syntax.
In particular, this:
data Employee = Employee
{ firstName :: String
, lastName :: String
, ssn :: String
} deriving (Show, Eq)
generates (among other things) a function firstName :: Employee -> String.
If you also allow in the same module this type:
data Manager = Manager
{ firstName :: String
, lastName :: String
, ssn :: String
, subordinates :: [Employee]
} deriving (Show, Eq)
then what would be the type of the firstName function?
It would have to be two separate functions overloading the same name, which Haskell does not allow. Unless you imagine that this would implicitly generate a typeclass and make instances of it for everything with a field named firstName (gets messy in the general case, when the fields could have different types), then Haskell's current record system isn't going to be able to support multiple fields with the same name in the same module. Haskell doesn't even attempt to do any such thing at present.
It could, of course, be done better. But there are some tricky problems to solve, and essentially no one's come up with solutions to them that have convinced everyone that there is a most promising direction to move in yet.

One option to avoid this is to put your data types in different modules and use qualified imports. In that way you can use the same field accessors on different data records and keep you code clean and more readable.
You can create one module for the employee, for example
module Model.Employee where
data Employee = Employee
{ firstName :: String
, lastName :: String
, ssn :: String
} deriving (Show, Eq)
And one module for the Manager, for example:
module Model.Manager where
import Model.Employee (Employee)
data Manager = Manager
{ firstName :: String
, lastName :: String
, ssn :: String
, subordinates :: [Employee]
} deriving (Show, Eq)
And then wherever you want to use these two data types you can import them qualified and access them as follows:
import Model.Employee (Employee)
import qualified Model.Employee as Employee
import Model.Manager (Manager)
import qualified Model.Manager as Manager
emp = Employee "Joe" "Smith" "111-22-3333"
man = Manager "Mary" "Jones" "333-22-1111" [emp]
name1 = Manager.firstName man
name2 = Employee.firstName emp
Keep in mind that after all you are using two different data types and thus Manger.firstName is another function than Employee.firstName, even when you know that both data types represent a person and each person has a first name. But it is up to you how far you go to abstract data types, for example to create a Person data type from those "attribute collections" as well.

Related

How do I create several related data types in haskell?

I have a User type that represents a user saved in the database. However, when displaying users, I only want to return a subset of these fields so I made a different type without the hash. When creating a user, a password will be provided instead of a hash, so I made another type for that.
This is clearly the worst, because there is tons of duplication between my types. Is there a better way to create several related types that all share some fields, but add some fields and remove others?
{-# LANGUAGE DeriveGeneric #}
data User = User {
id :: String,
email :: String,
hash :: String,
institutionId :: String
} deriving (Show, Generic)
data UserPrintable = UserPrintable {
email :: String,
id :: String,
institutionId :: String
} deriving (Generic)
data UserCreatable = UserCreatable {
email :: String,
hash :: String,
institutionId :: String
} deriving (Generic)
data UserFromRequest = UserFromRequest {
email :: String,
institutionId :: String,
password :: String
} deriving (Generic)
-- UGHHHHHHHHHHH
In this case, I think you can replace your various User types with functions. So instead of UserFromRequest, have:
userFromRequest :: Email -> InstitutionId -> String -> User
Note how you can also make separate types for Email and InstitutionId, which will help you avoid a bunch of annoying mistakes. This serves the same purpose as taking a record with labelled fields as an argument, while also adding a bit of extra static safety. You can just implement these as newtypes:
newtype Email = Email String deriving (Show, Eq)
Similarly, we can replace UserPrintable with showUser.
UserCreatable might be a bit awkard however, depending on how you need to use it. If all you ever do with it is take it as an argument and create a database row, then you can refactor it into a function the same way. But if you actually need the type for a bunch of things, this isn't a good solution.
In this second case, you have a couple of decent options. One would be to just make id a Maybe and check it each time. A better one would be to create a generic type WithId a which just adds an id field to anything:
data WithId a = { id :: DatabaseId, content :: a }
Then have a User type with no id and have your database functions work with a WithId User.

Haskell: Create a list of only certain "kind" of type?

I've been working through both Learn You a Haskell and Beginning Haskell and have come on an interesting problem. To preface, I'm normally a C++ programmer, so forgive me if I have no idea what I'm talking about.
One of the exercises in Beginning Haskell has me create a type Client, which can be a Government organization, Company, or Individual. I decided to try out record syntax for this.
data Client = GovOrg { name :: String }
| Company { name :: String,
id :: Integer,
contact :: String,
position :: String
}
| Individual { fullName :: Person,
offers :: Bool
}
deriving Show
data Person = Person { firstName :: String,
lastName :: String,
gender :: Gender
}
deriving Show
data Gender = Male | Female | Unknown
deriving Show
This is used for an exercise where given a list of Clients, I have to find how many of each gender are in the list. I started by filtering to get a list of just Individuals since only they have the Gender type, but my method seems to be completely wrong:
listIndividuals :: [Client] -> [Client]
listIndividuals xs = filter (\x -> x == Individual) xs
How would I get this functionality where I can check what "kind" of Client something is. Also for the record syntax, how is my coding style? Too inconsistent?
First of all, I would recommend not using record types with algebraic types, because you end up with partial accessor functions. For example, it is perfectly legal to have the code position (Individual (Person "John" "Doe" Male) True), but it will throw a runtime error. Instead, consider something more like
data GovClient = GovClient {
govName :: String
} deriving Show
data CompanyClient = CompanyClient {
companyName :: String,
companyID :: Integer, -- Also, don't overwrite existing names, `id` is built-in function
companyContact :: String,
companyPosition :: String
} deriving Show
data IndividualClient = IndividualClient {
indvFullName :: Person,
indvOffers :: Bool
} deriving Show
Then you can have
data Client
= GovOrg GovClient
| Company CompanyClient
| Individual IndividualClient
deriving (Show)
Now you can also define your function as
isIndividualClient :: Client -> Bool
isIndividualClient (Individual _) = True
isIndividualClient _ = False
listIndividuals :: [Client] -> [IndividualClient]
listIndividuals clients = filter isIndividualClient clients
Or the more point-free form of
listIndividuals = filter isIndividualClient
Here, in order to make the decision I've simply used pattern matching in a separate function to determine which of Client's constructors was used. Now you get the full power of record and algebraic types, with just a hair more code to worry about, but a lot more safety. You'll never accidentally call a function expecting a government client on an individual client, for example, because it wouldn't type check, whereas with your current implementation it would be more than possible.
If you're concerned with the longer names, I would recommend eventually looking into the lens library that is designed to help you manipulate complex trees of record types with relative ease.
With your current implementation, you could also do something pretty similar to the final solution:
isIndividualClient :: Client -> Bool
isIndividualClient (Individual _ _) = True
isIndividualClient _ = False
listIndividuals :: [Client] -> [Client]
listIndividuals clients = filter isIndividualClient clients
The main difference here is that Individual takes two fields, so I have two _ wildcard matches in the pattern, and the type of listIndividuals is now [Client] -> [Client].

Accessing members of a custom data type in Haskell

Say I have the following custom data type and function in Haskell:
data Person = Person { first_name :: String,
last_name :: String,
age :: Int
} deriving (Eq, Ord, Show)
If I want to create a function print_age to print a Person's age, like so: print_age (Person "John" "Smith" 21) , how would I write print_age to access the age parameter? I'm an Object Oriented guy, so I'm out of my element here. I'm basically looking for the equivalent of Person.age.
Function application is prefix, so age person would correspond to the person.age() common in OOP languages. The print_age function could be defined pointfree by function composition
print_age = print . age
or point-full
print_age person = print (age person)
This is called record syntax, LYAH has a good section on it.
When a datatype is defined with records, Haskell automatically defines functions with the same name as the record to act as accessors, so in this case age is the accessor for the age field (it has type Person -> Int), and similarly for first_name and last_name.
These are normal Haskell functions and so are called like age person or first_name person.
In addition to the age function mentioned in other answers, it is sometimes convenient to use pattern matching.
print_age Person { age = a } = {- the a variable contains the person's age -}
There is a pretty innocuous extension that allows you to skip the naming bit:
{-# LANGUAGE NamedFieldPuns #-}
print_age Person { age } = {- the age variable contains the person's age -}
...and another, viewed with varying degrees of distrust by various community members, which allows you to even skip saying which fields you want to bring into scope:
{-# LANGUAGE RecordWildCards #-}
print_age Person { .. } = {- first_name, last_name, and age are all defined -}

Haskell -- any way to qualify or disambiguate record names?

I have two data types, which are used for hastache templates. It makes sense in my code to have two different types, both with a field named "name". This, of course, causes a conflict. It seems that there's a mechanism to disambiguate any calls to "name", but the actual definition causes problems. Is there any workaround, say letting the record field name be qualified?
data DeviceArray = DeviceArray
{ name :: String,
bytes :: Int }
deriving (Eq, Show, Data, Typeable)
data TemplateParams = TemplateParams
{ arrays :: [DeviceArray],
input :: DeviceArray }
deriving (Eq, Show, Data, Typeable)
data MakefileParams = MakefileParams
{ name :: String }
deriving (Eq, Show, Data, Typeable)
i.e. if the fields are now used in code, they will be "DeviceArray.name" and "MakefileParams.name"?
As already noted, this isn't directly possible, but I'd like to say a couple things about proposed solutions:
If the two fields are clearly distinct, you'll want to always know which you're using anyway. By "clearly distinct" here I mean that there would never be a circumstance where it would make sense to do the same thing with either field. Given this, excess disambiguity isn't really unwelcome, so you'd want either qualified imports as the standard approach, or the field disambiguation extension if that's more to your taste. Or, as a very simplistic (and slightly ugly) option, just manually prefix the fields, e.g. deviceArrayName instead of just name.
If the two fields are in some sense the same thing, it makes sense to be able to treat them in a homogeneous way; ideally you could write a function polymorphic in choice of name field. In this case, one option is using a type class for "named things", with functions that let you access the name field on any appropriate type. A major downside here, besides a proliferation of trivial type constraints and possible headaches from the Dreaded Monomorphism Restriction, is that you also lose the ability to use the record syntax, which begins to defeat the whole point.
The other major option for similar fields, which I didn't see suggested yet, is to extract the name field out into a single parameterized type, e.g. data Named a = Named { name :: String, item :: a }. GHC itself uses this approach for source locations in syntax trees, and while it doesn't use record syntax the idea is the same. The downside here is that if you have a Named DeviceArray, accessing the bytes field now requires going through two layers of records. If you want to update the bytes field with a function, you're stuck with something like this:
addBytes b na = na { item = (item na) { bytes = b + bytes (item na) } }
Ugh. There are ways to mitigate the issue a bit, but they're still not idea, to my mind. Cases like this are why I don't like record syntax in general. So, as a final option, some Template Haskell magic and the fclabels package:
{-# LANGUAGE TemplateHaskell #-}
import Control.Category
import Data.Record.Label
data Named a = Named
{ _name :: String,
_namedItem :: a }
deriving (Eq, Show, Data, Typeable)
data DeviceArray = DeviceArray { _bytes :: Int }
deriving (Eq, Show, Data, Typeable)
data MakefileParams = MakefileParams { _makefileParams :: [MakeParam] }
deriving (Eq, Show, Data, Typeable)
data MakeParam = MakeParam { paramText :: String }
deriving (Eq, Show, Data, Typeable)
$(mkLabels [''Named, ''DeviceArray, ''MakefileParams, ''MakeParam])
Don't mind the MakeParam business, I just needed a field on there to do something with. Anyway, now you can modify fields like this:
addBytes b = modL (namedItem >>> bytes) (b +)
nubParams = modL (namedItem >>> makefileParams) nub
You could also name bytes something like bytesInternal and then export an accessor bytes = namedItem >>> bytesInternal if you like.
Record field names are in the same scope as the data type, so you cannot do this directly.
The common ways to work around this is to either add prefixes to the field names, e.g. daName, mpName, or put them in separate modules which you then import qualified.
What you can do is to put each data type in its own module, then you can used qualified imports to disambiguate. It's a little clunky, but it works.
There are several GHC extensions which may help. The linked one is applicable in your case.
Or, you could refactor your code and use typeclasses for the common fields in records. Or, you should manually prefix each record selector with a prefix.
If you want to use the name in both, you can use a Class that define the name funcion. E.g:
Class Named a where
name :: a -> String
data DeviceArray = DeviceArray
{ deviceArrayName :: String,
bytes :: Int }
deriving (Eq, Show, Data, Typeable)
instance Named DeviceArray where
name = deviceArrayName
data MakefileParams = MakefileParams
{ makefileParamsName :: String }
deriving (Eq, Show, Data, Typeable)
instance Named MakefileParams where
name = makefileParamsName
And then you can use name on both classes.

When should I use record syntax for data declarations in Haskell?

Record syntax seems extremely convenient compared to having to write your own accessor functions. I've never seen anyone give any guidelines as to when it's best to use record syntax over normal data declaration syntax, so I'll just ask here.
You should use record syntax in two situations:
The type has many fields
The type declaration gives no clue about its intended layout
For instance a Point type can be simply declared as:
data Point = Point Int Int deriving (Show)
It is obvious that the first Int denotes the x coordinate and the second stands for y. But the case with the following type declaration is different (taken from Learn You a Haskell for Great Good):
data Person = Person String String Int Float String String deriving (Show)
The intended type layout is: first name, last name, age, height, phone number, and favorite ice-cream flavor. But this is not evident in the above declaration. Record syntax comes handy here:
data Person = Person { firstName :: String
, lastName :: String
, age :: Int
, height :: Float
, phoneNumber :: String
, flavor :: String
} deriving (Show)
The record syntax made the code more readable, and saved a great deal of typing by automatically defining all the accessor functions for us!
In addition to complex multi-fielded data, newtypes are often defined with record syntax. In either of these cases, there aren't really any downsides to using record syntax, but in the case of sum types, record accessors usually don't make sense. For example:
data Either a b = Left { getLeft :: a } | Right { getRight :: b }
is valid, but the accessor functions are partial – it is an error to write getLeft (Right "banana"). For that reason, such accessors are generally speaking discouraged; something like getLeft :: Either a b -> Maybe a would be more common, and that would have to be defined manually. However, note that accessors can share names:
data Item = Food { description :: String, tastiness :: Integer }
| Wand { description :: String, magic :: Integer }
Now description is total, although tastiness and magic both still aren't.

Resources