I am trying to design an API for some database system in Haskell, and I would like to model the columns of this database in a way such that interactions between columns of different tables cannot get mixed up.
More precisely, imagine that you have a type to represent a table in a database, associated to some type:
type Table a = ...
and that you can extract the columns of the table, along with the type of the column:
type Column col = ...
Finally, there are various extractors. For example, if your table contains descriptions of frogs, a function would let you extract the column containing the weight of the frog:
extractCol :: Table Frog -> Column Weight
Here is the question: I would like to distinguish the origin of the columns so that users cannot do operations between tables. For example:
bullfrogTable = undefined :: Table Frog
toadTable = undefined :: Table Frog
bullfrogWeights = extractCol bullfrogTable
toadWeights = extractCol toadTable
-- Or some other columns from the toad table
toadWeights' = extractCol toadTable
-- This should compile
addWeights toadWeights' toadWeights
-- This should trigger a type error
addWeights bullfrogWeights toadWeights
I know how to achieve this in Scala (using path-dependent types, see [1]), and I have been thinking of 3 options in Haskell:
not using types, and just doing a check at runtime (the current solution)
the TypeInType extension to add a phantom type on the Table type itself, and pass this extra type to the columns. I am not keen on it, because the construction of such a type would be very complicated (tables are generated through complex DAG operations) and probably slow to compile in this context.
wrapping the operations using a forall construct similar to the ST monad, but in my case, I would like the extra tagging type to actually escape the construction.
I am happy to have a very limited valid scoping for the construction of the same columns (i.e. columns from table and (id table) not being mixable), and I mostly care about the DSL feel of the API rather than the safety.
[1] What is meant by Scala's path-dependent types?
My current solution
Here is what I ended up doing, using RankNTypes.
I still want to give users the ability to use columns how they see fit without having some strong type checks, and opt in if they want some stronger type guarantees: this is a DSL for data scientist who will not know the power of the Haskell side
Tables are still tagged by their content:
type Table a = ...
and columns are now tagged with some extra reference types, on top of the type of the data they contain:
type Column ref col = ...
Projections from tables to columns are either tagged or untagged. In practice, this is hidden behind a lens-like DSL.
extractCol :: Table Frog -> Column Frog Weight
data TaggedTable ref a = TaggedTable { _ttTable :: Table a }
extractColTagged :: Table ref Frog -> Column ref Weight
withTag :: Table a -> (forall ref. TaggedTable ref a -> b) -> b
withTag tb f = f (TaggedTable tb)
Now I can write some code as following:
let doubleToadWeights = withTag toadTable $ \ttoadTable ->
let toadWeights = extractColTagged ttoadTable in
addWeights toadWeights toadWeights
and this will not compile, as desired:
let doubleToadWeights =
toadTable `withTag` \ttoads ->
bullfrogTable `withTag` \tbullfrogs ->
let toadWeights = extractColTagged ttoads
bullfrogWeights = extractColTagged tbullfrogs
in addWeights toadWeights bullfrogWeights -- Type error
From a DSL perspective, I believe it is not as straightforward as what one could achieve with Scala, but the type error message is understandable, which is paramount for me.
Haskell does not (as far as I know) have path dependent types, but you can get some of the way by using rank 2 types. For instance the ST monad has a dummy type parameter s that is used to prevent leakage between invocations of runST:
runST :: (forall s . ST s a) -> a
Within an ST action you can have an STRef:
newSTRef :: a -> ST s (STRef s a)
But the STRef you get carries the s type parameter, so it isn't allowed to escape from the runST.
Related
If I wanted to perform a search on a problem space and I wanted to keep track of different states a node has already visited, I several options to do it depending on the constraints of those states. However; is there a way I can dispatch a function or another depending on the constraint of the states the user is using as input? For example, if I had:
data Node a = Node { state :: a, cost :: Double }
And I wanted to perform a search on a Problem a, is there a way I could check if a is Eq, Ord or Hashable and then call a different kind of search? In pseudocode, something like:
search :: Eq a => Problem a -> Node a
search problem#(... initial ...) -- Where initial is a State of type a
| (Hashable initial) = searchHash problem
| (Ord initial) = searchOrd problem
| otherwise = searchEq problem
I am aware I could just let the user choose one search or another depending on their own use; but being able to do something like that could be very handy for me since search is not really one of the user endpoints as such (one example could be a function bfs, that calls search with some parameters to make it behave like a Breadth-First Search).
No, you can't do this. However, you could make your own class:
class Memorable a where
type Memory a
remember :: a -> Memory a -> Memory a
known :: a -> Memory a -> Bool
Instantiate this class for a few base types, and add some default implementations for folks that want to add new instances, e.g.
-- suitable implementations of Memorable methods and type families for hashable things
type HashMemory = Data.HashSet.HashSet
hashRemember = Data.HashSet.insert
hashKnown = Data.HashSet.member
-- suitable implementations for orderable things
type OrdMemory = Data.Set.Set
ordRemember = Data.Set.insert
ordKnown = Data.Set.member
-- suitable implementations for merely equatable things
type EqMemory = Prelude.[]
eqRemember = (Prelude.:)
eqKnown = Prelude.elem
Consider the following fragment:
data File
= NoFile
| FileInfo {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime
}
| FileFull {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime,
content :: String
}
deriving Eq
That duplication is a bit of a "wart", though in this one-off instance not particularly painful. In order to further improve my understanding of Haskell's rich type system, what might be preferred "clean"/"idiomatic" approaches for refactoring other than either simply creating a separate data record type for the 2 duplicate fields (then replacing them with single fields of that new data type) or replacing the FileFull record notation with something like | FileFull File String, which wouldn't be quite clean either (as here one would only want FileInfo in there for example, not NoFile)?
(Both these "naive" approaches would be somewhat intrusive/annoying with respect to having to then fix up many modules manually throughout the rest of the code-base here.)
One thing I considered would be parameterizing like so:
data File a
= NoFile
| FileMaybeWithContent {
path :: FilePath,
modTime :: Data.Time.Clock.UTCTime
content :: a
}
deriving Eq
Then for those "just info, not loaded" contexts a would be (), otherwise String. Seems too general anyway, we want either String or nothing, leading us to Maybe, doing once again away with the a parameter.
Of course we've been there before: content could just be done with Maybe String of course, then "refactor any compile errors away" and "done". That'll probably be the order of the day, but knowing Haskell and the many funky GHC extensions.. who knows just what exotic theoretic trick/axiom/law I've been missing, right?! See, the differently-named "semantic insta-differentiator" between a "just meta-data info" value and a "file content with meta info" value does work well throughout the rest of the code-base as far as eased comprehension.
(And yes, I perhaps should have removed NoFile and used Maybe Files throughout, but then... not sure whether there's really a solid reason to do so and a different question altogether anyway..)
All of the following are equivalent/isomorphic, as I think you've discovered:
data F = U | X A B | Y A B C
data F = U | X AB | Y AB C
data AB = AB A B
data F = U | X A B (Maybe C)
So the color of the bike shed really depends on the context (e.g. do you have use for an AB elsewhere?) and your own aesthetic preferences.
It might clarify things and help you understand what you're doing to have some sense of the algebra of algebraic data types
We call types like Either "sum types" and types like (,) "product types" and they are subject to the same kinds of transformations you're familiar with like factoring
f = 1 + (a * b) + (a * b * c)
= 1 + ((a * b) * ( 1 + c))
As others have noted, the NoFile constructor is probably not necessary, but you can keep it if you want. If you feel your code is more readable and/or better understood with it, then I say keep it.
Now the trick with combining the other two constructors is by hiding the content field. You were on the right track by parameterizing File, but that alone isn't enough since then we can have File Foo, File Bar, etc. Fortunately, GHC has some nifty ways to help us.
I'll write out the code here and then explain how it works.
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE DataKinds #-}
import Data.Void
data Desc = Info | Full
type family Content (a :: Desc) where
Content Full = String
Content _ = Void
data File a = File
{ path :: FilePath
, modTime :: UTCTime
, content :: Content a
}
There are a few things going on here.
First, note that in the File record, the content field now has type Content a instead of just a. Content is a type family, which is (in my opinion) a confusing name for type-level function. That is, the compiler replaces Content a with some other type based on what a is and how we've defined Content.
We defined Content Full to be String, so that when we have a value f1 :: File Full, its content field will have a String value. On the other hand, f2 :: File Info will have a content field with type Void which has no values.
Cool right? But what's preventing us from having File Foo now?
That's where DataKinds comes to the rescue. It "promotes" the data type Desc to a kind (the type of types in Haskell) and type constructors ,Info and Full, to types of kind Desc instead of merely values of type Desc.
Notice in the declaration of Content that I have annotated a. It looks like a type annotation, but a is already a type. This is a kind annotation. It forces a to be something of kind Desc and the only types of kind Desc are Info and Full.
By now you're probably totally sold on how awesome this is, but I should warn you there's no free lunch. In particular, this is a compile-time construction. Your single File type becomes two different types. This can cause other related logic (producers and consumers of File records) to become complicated. If your use case doesn't mix File Info records with File Full records, then this is the way to go. On the other hand, if you want to do something like have a list of File records which can be a mixture of both types, then you're better off just making the type of your content field Maybe String.
Another thing is, how exactly do you make a File Info since there's no value of Void to use for the content field? Well, technically it should be ok to use undefined or error "this should never happen" since it is (morally) impossible to have a function of type Void -> a, but if that makes you feel uneasy (and it probably should), then just replace Void with (). Unit is almost as useless and doesn't require 'values' of bottom.
I am learning Haskell from learnyouahaskell.com. I am having trouble understanding type constructors and data constructors. For example, I don't really understand the difference between this:
data Car = Car { company :: String
, model :: String
, year :: Int
} deriving (Show)
and this:
data Car a b c = Car { company :: a
, model :: b
, year :: c
} deriving (Show)
I understand that the first is simply using one constructor (Car) to built data of type Car. I don't really understand the second one.
Also, how do data types defined like this:
data Color = Blue | Green | Red
fit into all of this?
From what I understand, the third example (Color) is a type which can be in three states: Blue, Green or Red. But that conflicts with how I understand the first two examples: is it that the type Car can only be in one state, Car, which can take various parameters to build? If so, how does the second example fit in?
Essentially, I am looking for an explanation that unifies the above three code examples/constructs.
In a data declaration, a type constructor is the thing on the left hand side of the equals sign. The data constructor(s) are the things on the right hand side of the equals sign. You use type constructors where a type is expected, and you use data constructors where a value is expected.
Data constructors
To make things simple, we can start with an example of a type that represents a colour.
data Colour = Red | Green | Blue
Here, we have three data constructors. Colour is a type, and Green is a constructor that contains a value of type Colour. Similarly, Red and Blue are both constructors that construct values of type Colour. We could imagine spicing it up though!
data Colour = RGB Int Int Int
We still have just the type Colour, but RGB is not a value – it's a function taking three Ints and returning a value! RGB has the type
RGB :: Int -> Int -> Int -> Colour
RGB is a data constructor that is a function taking some values as its arguments, and then uses those to construct a new value. If you have done any object-oriented programming, you should recognise this. In OOP, constructors also take some values as arguments and return a new value!
In this case, if we apply RGB to three values, we get a colour value!
Prelude> RGB 12 92 27
#0c5c1b
We have constructed a value of type Colour by applying the data constructor. A data constructor either contains a value like a variable would, or takes other values as its argument and creates a new value. If you have done previous programming, this concept shouldn't be very strange to you.
Intermission
If you'd want to construct a binary tree to store Strings, you could imagine doing something like
data SBTree = Leaf String
| Branch String SBTree SBTree
What we see here is a type SBTree that contains two data constructors. In other words, there are two functions (namely Leaf and Branch) that will construct values of the SBTree type. If you're not familiar with how binary trees work, just hang in there. You don't actually need to know how binary trees work, only that this one stores Strings in some way.
We also see that both data constructors take a String argument – this is the String they are going to store in the tree.
But! What if we also wanted to be able to store Bool, we'd have to create a new binary tree. It could look something like this:
data BBTree = Leaf Bool
| Branch Bool BBTree BBTree
Type constructors
Both SBTree and BBTree are type constructors. But there's a glaring problem. Do you see how similar they are? That's a sign that you really want a parameter somewhere.
So we can do this:
data BTree a = Leaf a
| Branch a (BTree a) (BTree a)
Now we introduce a type variable a as a parameter to the type constructor. In this declaration, BTree has become a function. It takes a type as its argument and it returns a new type.
It is important here to consider the difference between a concrete type (examples include Int, [Char] and Maybe Bool) which is a type that can be assigned to a value in your program, and a type constructor function which you need to feed a type to be able to be assigned to a value. A value can never be of type "list", because it needs to be a "list of something". In the same spirit, a value can never be of type "binary tree", because it needs to be a "binary tree storing something".
If we pass in, say, Bool as an argument to BTree, it returns the type BTree Bool, which is a binary tree that stores Bools. Replace every occurrence of the type variable a with the type Bool, and you can see for yourself how it's true.
If you want to, you can view BTree as a function with the kind
BTree :: * -> *
Kinds are somewhat like types – the * indicates a concrete type, so we say BTree is from a concrete type to a concrete type.
Wrapping up
Step back here a moment and take note of the similarities.
A data constructor is a "function" that takes 0 or more values and gives you back a new value.
A type constructor is a "function" that takes 0 or more types and gives you back a new type.
Data constructors with parameters are cool if we want slight variations in our values – we put those variations in parameters and let the guy who creates the value decide what arguments they are going to put in. In the same sense, type constructors with parameters are cool if we want slight variations in our types! We put those variations as parameters and let the guy who creates the type decide what arguments they are going to put in.
A case study
As the home stretch here, we can consider the Maybe a type. Its definition is
data Maybe a = Nothing
| Just a
Here, Maybe is a type constructor that returns a concrete type. Just is a data constructor that returns a value. Nothing is a data constructor that contains a value. If we look at the type of Just, we see that
Just :: a -> Maybe a
In other words, Just takes a value of type a and returns a value of type Maybe a. If we look at the kind of Maybe, we see that
Maybe :: * -> *
In other words, Maybe takes a concrete type and returns a concrete type.
Once again! The difference between a concrete type and a type constructor function. You cannot create a list of Maybes - if you try to execute
[] :: [Maybe]
you'll get an error. You can however create a list of Maybe Int, or Maybe a. That's because Maybe is a type constructor function, but a list needs to contain values of a concrete type. Maybe Int and Maybe a are concrete types (or if you want, calls to type constructor functions that return concrete types.)
Haskell has algebraic data types, which very few other languages have. This is perhaps what's confusing you.
In other languages, you can usually make a "record", "struct" or similar, which has a bunch of named fields that hold various different types of data. You can also sometimes make an "enumeration", which has a (small) set of fixed possible values (e.g., your Red, Green and Blue).
In Haskell, you can combine both of these at the same time. Weird, but true!
Why is it called "algebraic"? Well, the nerds talk about "sum types" and "product types". For example:
data Eg1 = One Int | Two String
An Eg1 value is basically either an integer or a string. So the set of all possible Eg1 values is the "sum" of the set of all possible integer values and all possible string values. Thus, nerds refer to Eg1 as a "sum type". On the other hand:
data Eg2 = Pair Int String
Every Eg2 value consists of both an integer and a string. So the set of all possible Eg2 values is the Cartesian product of the set of all integers and the set of all strings. The two sets are "multiplied" together, so this is a "product type".
Haskell's algebraic types are sum types of product types. You give a constructor multiple fields to make a product type, and you have multiple constructors to make a sum (of products).
As an example of why that might be useful, suppose you have something that outputs data as either XML or JSON, and it takes a configuration record - but obviously, the configuration settings for XML and for JSON are totally different. So you might do something like this:
data Config = XML_Config {...} | JSON_Config {...}
(With some suitable fields in there, obviously.) You can't do stuff like this in normal programming languages, which is why most people aren't used to it.
Start with the simplest case:
data Color = Blue | Green | Red
This defines a "type constructor" Color which takes no arguments - and it has three "data constructors", Blue, Green and Red. None of the data constructors takes any arguments. This means that there are three of type Color: Blue, Green and Red.
A data constructor is used when you need to create a value of some sort. Like:
myFavoriteColor :: Color
myFavoriteColor = Green
creates a value myFavoriteColor using the Green data constructor - and myFavoriteColor will be of type Color since that's the type of values produced by the data constructor.
A type constructor is used when you need to create a type of some sort. This is usually the case when writing signatures:
isFavoriteColor :: Color -> Bool
In this case, you are calling the Color type constructor (which takes no arguments).
Still with me?
Now, imagine you not only wanted to create red/green/blue values but you also wanted to specify an "intensity". Like, a value between 0 and 256. You could do that by adding an argument to each of the data constructors, so you end up with:
data Color = Blue Int | Green Int | Red Int
Now, each of the three data constructors takes an argument of type Int. The type constructor (Color) still doesn't take any arguments. So, my favorite color being a darkish green, I could write
myFavoriteColor :: Color
myFavoriteColor = Green 50
And again, it calls the Green data constructor and I get a value of type Color.
Imagine if you don't want to dictate how people express the intensity of a color. Some might want a numeric value like we just did. Others may be fine with just a boolean indicating "bright" or "not so bright". The solution to this is to not hardcode Int in the data constructors but rather use a type variable:
data Color a = Blue a | Green a | Red a
Now, our type constructor takes one argument (another type which we just call a!) and all of the data constructors will take one argument (a value!) of that type a. So you could have
myFavoriteColor :: Color Bool
myFavoriteColor = Green False
or
myFavoriteColor :: Color Int
myFavoriteColor = Green 50
Notice how we call the Color type constructor with an argument (another type) to get the "effective" type which will be returned by the data constructors. This touches the concept of kinds which you may want to read about over a cup of coffee or two.
Now we figured out what data constructors and type constructors are, and how data constructors can take other values as arguments and type constructors can take other types as arguments. HTH.
As others pointed out, polymorphism isn't that terrible useful here. Let's look at another example you're probably already familiar with:
Maybe a = Just a | Nothing
This type has two data constructors. Nothing is somewhat boring, it doesn't contain any useful data. On the other hand Just contains a value of a - whatever type a may have. Let's write a function which uses this type, e.g. getting the head of an Int list, if there is any (I hope you agree this is more useful than throwing an error):
maybeHead :: [Int] -> Maybe Int
maybeHead [] = Nothing
maybeHead (x:_) = Just x
> maybeHead [1,2,3] -- Just 1
> maybeHead [] -- None
So in this case a is an Int, but it would work as well for any other type. In fact you can make our function work for every type of list (even without changing the implementation):
maybeHead :: [t] -> Maybe t
maybeHead [] = Nothing
maybeHead (x:_) = Just x
On the other hand you can write functions which accept only a certain type of Maybe, e.g.
doubleMaybe :: Maybe Int -> Maybe Int
doubleMaybe Just x = Just (2*x)
doubleMaybe Nothing= Nothing
So long story short, with polymorphism you give your own type the flexibility to work with values of different other types.
In your example, you may decide at some point that String isn't sufficient to identify the company, but it needs to have its own type Company (which holds additional data like country, address, back accounts etc). Your first implementation of Car would need to change to use Company instead of String for its first value. Your second implementation is just fine, you use it as Car Company String Int and it would work as before (of course functions accessing company data need to be changed).
The second one has the notion of "polymorphism" in it.
The a b c can be of any type. For example, a can be a [String], b can be [Int]
and c can be [Char].
While the first one's type is fixed: company is a String, model is a String and year is Int.
The Car example might not show the significance of using polymorphism. But imagine your data is of the list type. A list can contain String, Char, Int ... In those situations, you will need the second way of defining your data.
As to the third way I don't think it needs to fit into the previous type. It's just one other way of defining data in Haskell.
This is my humble opinion as a beginner myself.
Btw: Make sure that you train your brain well and feel comfortable to this. It is the key to understand Monad later.
It's about types: In the first case, your set the types String (for company and model) and Int for year. In the second case, your are more generic. a, b, and c may be the very same types as in the first example, or something completely different. E.g., it may be useful to give the year as string instead of integer. And if you want, you may even use your Color type.
I am attempting to create a function in Haskell returning the Resp type illustrated below in a strange mix between BNF and Haskell types.
elem ::= String | (String, String, Resp)
Resp ::= [elem]
My question is (a) how to define this type in Haskell, and (b) if there is a way of doing so without being forced to use custom constructors, e.g., Node, rather using only tuples and arrays.
You said that "the variety of keywords (data, type, newtype) has been confusing for me". Here's a quick primer on the data construction keywords in Haskell.
Data
The canonical way to create a new type is with the data keyword. A general type in Haskell is a union of product types, each of which is tagged with a constructor. For example, an Employee might be a line worker (with a name and a salary) or a manager (with a name, salary and a list of reports).
We use the String type to represent an employee's name, and the Int type to represent a salaray. A list of reports is just a list of Employees.
data Employee = Worker String Int
| Manager String Int [Employee]
Type
The type keyword is used to create type synonyms, i.e. alternate names for the same type. This is typically used to make the source more immediately understandable. For example, we could declare a type Name for employee names (which is really just a String) and Salary for salaries (which are just Ints), and Reports for a list of reports.
type Name = String
type Salary = Int
type Reports = [Employee]
data Employee = Worker Name Salary
| Manager Name Salary Reports
Newtype
The newtype keyword is similar to the type keyword, but it adds an extra dash of type safety. One problem with the previous block of code is that, although a worker is a combination of a Name and a Salary, there is nothing to stop you using any old String in the Name field (for example, an address). The compiler doesn't distinguish between Names and plain old Strings, which introduces a class of potential bugs.
With the newtype keyword we can make the compiler enforce that the only Strings that can be used in a Name field are the ones explicitly tagged as Names
newtype Name = Name String
newtype Salary = Salary Int
newtype Reports = Reports [Employee]
data Employee = Worker Name Salary
| Manager Name Salary Reports
Now if we tried to enter a String in the Name field without explicitly tagging it, we get a type error
>>> let kate = Worker (Name "Kate") (Salary 50000) -- this is ok
>>> let fred = Worker "18 Tennyson Av." (Salary 40000) -- this will fail
<interactive>:10:19:
Couldn't match expected type `Name' with actual type `[Char]'
In the first argument of `Worker', namely `"18 Tennyson Av."'
In the expression: Worker "18 Tennyson Av." (Salary 40000)
In an equation for `fred':
fred = Worker "18 Tennyson Av." (Salary 40000)
What's great about this is that because the compiler knows that a Name is really just a String, it optimizes away the extra constructor, so this is just as efficient as using a type declaration -- the extra type safety comes "for free". This requires an important restriction -- a newtype has exactly one constructor with exactly one value. Otherwise the compiler wouldn't know which constructor or value was the correct synonym!
One disadvantage of using a newtype declaration is that now a Salary is no longer just an Int, you can't directly add them together. For example
>>> let kate'sSalary = Salary 50000
>>> let fred'sSalary = Salary 40000
>>> kate'sSalary + fred'sSalary
<interactive>:14:14:
No instance for (Num Salary)
arising from a use of `+'
Possible fix: add an instance declaration for (Num Salary)
In the expression: kate'sSalary + fred'sSalary
In an equation for `it': it = kate'sSalary + fred'sSalary
The somewhat complicated error message is telling you that a Salary isn't a numeric type, so you can't add them together (or at least, you haven't told the compiler how to add them together). One option would be to define a function that gets the underlying Int from the Salary
getSalary :: Salary -> Int
getSalary (Salary sal) = sal
but in fact Haskell will write these for you if you use record syntax when declaring your newtypes
data Salary = Salary { getSalary :: Int }
Now you can write
>>> getSalary kate'sSalary + getSalary fred'sSalary
90000
Part 1:
data Elem = El String | Node String String Resp
type Resp = [Elem]
Part 2: Well... kinda. The unsatisfying answer is: You shouldn't want to because doing so is less type safe. The more direct answer is Elem needs it's own constructor but Resp is easily defined as a type synonym as above. However, I would recommend
newtype Resp = Resp { getElems :: [Elem] }
so that you can't mix up some random list of Elems with a Resp. This also gives you the function getElems so you don't have to do as much pattern matching on a single constructor. The newtype basically let's Haskell know that it should get rid of the overhead of the constructor during runtime so there's no extra indirection which is nice.
I just uncovered this confusion and would like a confirmation that it is what it is. Unless, of course, I am just missing something.
Say, I have these data declarations:
data VmInfo = VmInfo {name, index, id :: String} deriving (Show)
data HostInfo = HostInfo {name, index, id :: String} deriving (Show)
vm = VmInfo "vm1" "01" "74653"
host = HostInfo "host1" "02" "98732"
What I always thought and what seems to be so natural and logical is this:
vmName = vm.name
hostName = host.name
But this, obviously, does not work. I got this.
Questions
So my questions are.
When I create a data type with record syntax, do I have to make sure that all the fields have unique names? If yes - why?
Is there a clean way or something similar to a "scope resolution operator", like :: or ., etc., so that Haskell distinguishes which data type the name (or any other none unique fields) belongs to and returns the correct result?
What is the correct way to deal with this if I have several declarations with the same field names?
As a side note.
In general, I need to return data types similar to the above example.
First I returned them as tuples (seemed to me the correct way at the time). But tuples are hard to work with as it is impossible to extract individual parts of a complex type as easy as with the lists using "!!". So next thing I thought of the dictionaries/hashes.
When I tried using dictionaries I thought what is the point of having own data types then?
Playing/learning data types I encountered the fact that led me to the above question.
So it looks like it is easier for me to use dictionaries instead of own data types as I can use the same fields for different objects.
Can you please elaborate on this and tell me how it is done in real world?
Haskell record syntax is a bit of a hack, but the record name emerges as a function, and that function has to have a unique type. So you can share record-field names among constructors of a single datatype but not among distinct datatypes.
What is the correct way to deal with this if I have several declarations with the same field names?
You can't. You have to use distinct field names. If you want an overloaded name to select from a record, you can try using a type class. But basically, field names in Haskell don't work the way they do in say, C or Pascal. Calling it "record syntax" might have been a mistake.
But tuples are hard to work with as it is impossible to extract individual parts of a complex type
Actually, this can be quite easy using pattern matching. Example
smallId :: VmInfo -> Bool
smallId (VmInfo { vmId = n }) = n < 10
As to how this is done in the "real world", Haskell programmers tend to rely heavily on knowing what type each field is at compile time. If you want the type of a field to vary, a Haskell programmer introduces a type parameter to carry varying information. Example
data VmInfo a = VmInfo { vmId :: Int, vmName :: String, vmInfo :: a }
Now you can have VmInfo String, VmInfo Dictionary, VmInfo Node, or whatever you want.
Summary: each field name must belong to a unique type, and experienced Haskell programmers work with the static type system instead of trying to work around it. And you definitely want to learn about pattern matching.
There are more reasons why this doesn't work: lowercase typenames and data constructors, OO-language-style member access with .. In Haskell, those member access functions actually are free functions, i.e. vmName = name vm rather than vmName = vm.name, that's why they can't have same names in different data types.
If you really want functions that can operate on both VmInfo and HostInfo objects, you need a type class, such as
class MachineInfo m where
name :: m -> String
index :: m -> String -- why String anyway? Shouldn't this be an Int?
id :: m -> String
and make instances
instance MachineInfo VmInfo where
name (VmInfo vmName _ _) = vmName
index (VmInfo _ vmIndex _) = vmIndex
...
instance MachineInfo HostInfo where
...
Then name machine will work if machine is a VmInfo as well as if it's a HostInfo.
Currently, the named fields are top-level functions, so in one scope there can only be one function with that name. There are plans to create a new record system that would allow having fields of the same name in different record types in the same scope, but that's still in the design phase.
For the time being, you can make do with unique field names, or define each type in its own module and use the module-qualified name.
Lenses can help take some of the pain out of dealing with getting and setting data structure elements, especially when they get nested. They give you something that looks, if you squint, kind of like object-oriented accessors.
Learn more about the Lens family of types and functions here: http://lens.github.io/tutorial.html
As an example for what they look like, this is a snippet from the Pong example found at the above github page:
data Pong = Pong
{ _ballPos :: Point
, _ballSpeed :: Vector
, _paddle1 :: Float
, _paddle2 :: Float
, _score :: (Int, Int)
, _vectors :: [Vector]
-- Since gloss doesn't cover this, we store the set of pressed keys
, _keys :: Set Key
}
-- Some nice lenses to go with it
makeLenses ''Pong
That makes lenses to access the members without the underscores via some TemplateHaskell magic.
Later on, there's an example of using them:
-- Update the paddles
updatePaddles :: Float -> State Pong ()
updatePaddles time = do
p <- get
let paddleMovement = time * paddleSpeed
keyPressed key = p^.keys.contains (SpecialKey key)
-- Update the player's paddle based on keys
when (keyPressed KeyUp) $ paddle1 += paddleMovement
when (keyPressed KeyDown) $ paddle1 -= paddleMovement
-- Calculate the optimal position
let optimal = hitPos (p^.ballPos) (p^.ballSpeed)
acc = accuracy p
target = optimal * acc + (p^.ballPos._y) * (1 - acc)
dist = target - p^.paddle2
-- Move the CPU's paddle towards this optimal position as needed
when (abs dist > paddleHeight/3) $
case compare dist 0 of
GT -> paddle2 += paddleMovement
LT -> paddle2 -= paddleMovement
_ -> return ()
-- Make sure both paddles don't leave the playing area
paddle1 %= clamp (paddleHeight/2)
paddle2 %= clamp (paddleHeight/2)
I recommend checking out the whole program in its original location and looking through the rest of the lens material; it's very interesting even if you don't end up using them.
Yes, you cannot have two records in the same module with the same field names. The field names are added to the module's scope as functions, so you would use name vm rather than vm.name. You could have two records with the same field names in different modules and import one of the modules qualified as some name, but this is probably awkward to work with.
For a case like this, you should probably just use a normal algebraic data type:
data VMInfo = VMInfo String String String
(Note that the VMInfo has to be capitalized.)
Now you can access the fields of VMInfo by pattern matching:
myFunc (VMInfo name index id) = ... -- name, index and id are bound here