Simple Refinement Types in Haskell - haskell

From Scott Wlaschin's blog post and Book "Domain Modeling made Functional", and from Alexis King's post, I take that a domain model should encode as much information about the domain as is practical in the types, so as to "make illegal states unrepresentable" and to obtain strong guarantees that allows me to write domain logic functions that are total.
In basic enterprise applications, we have many basic domain types like street names, company names, cities and the like. To represent them in a form that prevents most errors later on, I would like to use a type that lets me
restrict the maximum and minimum number of characters.
specify the subset of characters that may be used,
add additional constraints, like no leading or trailing whitespace.
I can think of two ways to implement such types: As custom abstract data types with smart constructors and hidden data constructors, or via some type-level machinery (I vaguely read about refinement types? Can such types be represented via some of the newer language extensions? via LiquidHaskell?). Which is the sensible way to go? Which approach most easily works with all the functions that operate on regular Text, and how can I most easily combine two or more values of the same refined type, map over them, etc.?
Ideally, there would be a library to help me create such custom types. Is there?

following Alexis King's blog, I'd say that a suitable solution would be something like below. Of course, other solutions are posible.
import Control.Monad (>=>)
newtype StreetName = StreetName {getStreetName :: String}
-- compose all validations and wrap them in new constructor.
asStreetName :: String -> Maybe StreetName
asStreetName = StreetName <$> rightNumberOfChars >=> rightCharSubset >=> noTrailingWhiteSpace
-- This funcs will process the string, and produce another validated string or nothing.
rightNumberOfChars :: String -> Maybe String
rightNumberOfChars s = {- ... -}
rightCharSubset :: String -> Maybe String
rightCharSubset s = {- ... -}
noTrailingWhiteSpace :: String -> Maybe String
noTrailingWhiteSpaces = {- ... -}
main = do
street <- readStreetAsString
case asStreetName street of
Just s -> {- s is now validated -}
Nothing -> {- handle the error -}
make StreetName a hiden constructor, as use asStreetName as a smart constructor. Remember that other functions should use StreetName instead of String in the type, to make sure that data is validated.

Related

understanding data structure in Haskell

I have a problem with a homework (the topic is : "functional data structures").
Please understand that I don't want anyone to solve my homework.
I just have a problem with understanding the structure of this :
data Heap e t = Heap {
empty :: t e,
insert :: e -> t e -> t e,
findMin :: t e -> Maybe e,
deleteMin :: t e -> Maybe (t e),
merge :: t e -> t e -> t e,
contains :: e -> t e -> Maybe Int
}
In my understanding "empty" "insert" and so on are functions which can applied to "Heap"-type data.
Now I just want to understand how that "Heap"thing looks like.
So I was typing things like :
a = Heap 42 42
But I get errors I can't really work with.
Maybe it is a dumb question and I'm just stuck at this point for no reason, but it is killing me at the moment.
Thankful to any help
If you truly wish to understand that type, you need to understand a few requisites first.
types and values (and functions)
Firstly, you need to understand what types and values are. I'm going to assume you understand this. You understand, for example, the separation between "hello" as a value and its type, String and you understand clearly what it means when I say a = "hello" :: String and:
a :: String
a = "hello"
If you don't understand that, then you need to research values and types in Haskell. There are a myriad of books that can help here, such as this one, which I helped to author: http://happylearnhaskelltutorial.com
I'm also going to assume you understand what functions and currying are, and how to use both of them.
polymorphic types
Secondly, as your example contains type variables, you'll need to understand what they are. That is, you need to understand what polymoprhic types are. So, for example, Maybe a, or Either a b, and you'll need to understand how Maybe String is different to Maybe Int and what Num a => [a] and even things like what Num a => [Maybe a] is.
Again, there are many free or paid books that can help, the example above covers this, too.
algebraic data types
Next up is algebraic data types. This is a pretty amazingly cool feature that Haskell has. Haskell-like languages such as Elm and Idris have it as well as others like Rust, too. It lets you define your own data types. These aren't just things like Structs in other languages, and yeah, they can even contain functions.
Maybe is actually an example of an algebraic data types. If you understand these, you'll know that:
data Direction = North | South | East | West
defines a data type called Direction whose values can only be one of North, South, East or West, and you'll know that you can also use the polymorhpic type variables above to parameterise your types like so:
data Tree a = EmptyNode | Node (Tree a) (Tree a)
which uses both optionality (as in the sum type of Direction above) as well as parameterization.
In addition to this, you can also have multiple types in each value. These are called product types, and Haskell's algebraic datatypes can be expressed as a combination of Sum types that can contain Product types. For example:
type Location = (Float, Float)
data ShapeNode = StringNode Location String | CircleNode Location Float | SquareNode Location Float Float
That is, each value can be one of StringNode, CircleNode or SquareNode, and in each case there are a different set of fields given to each value. To create a StringNode, for example, you'd need to pass the values of it constructor like this: StringNode (10.0, 5.3) "A String".
Again, the freely available books will go into much more detail about these things, but we're moving in the direction of getting more than a basic understanding of Haskell now.
Finally, in order to fully understand your example, you'll need to know about...
record types
Record types are the same as product types above, except that the fields are labelled rather than being anonymous. So, you could define the shape node data type like this, instead:
type Location = (Float, Float)
data ShapeNode
= StringNode { stringLocation :: Location, stringData :: String }
| CircleNode { circleLocation :: Location, radius :: Float }
| SquareNode { squareLocation :: Location, length :: Float, height :: Float }
Each field is named, and you can't repeat the same name inside data values.
All that you need in addition to this to understand the above example is to realise your example contains all of these things together, along with the fact that you have functions as your record field values in the data type you have.
It's a good idea to thoroughly flesh out your understanding and not skip any steps, then you'll be able to follow these kinds of things much more easily in the future. :) I wish you luck!
Heap is a record with six elements. In order to create a value of that type, you must supply all six elements. Assuming that you have appropriate values and functions, you can create a value like this:
myHeap = Heap myEmpty myInsert myFindMin myDeleteMin myMerge myContains
The doesn't seem like idiomatic Haskell design, however. Why not define generic functions independent of the data, or, if they must be bundled together, a typeclass?

Redundancy regarding product types and tuples in Haskell

In Haskell you have product types and you have tuples.
You use tuples if you don't want to associate a dedicated type with the value, and you can use product types if you wish to do so.
However I feel there is redundancy in the notation of product types
data Foo = Foo (String, Int, Char)
data Bar = Bar String Int Char
Why are there both kinds of notations? Is there any case where you would prefer one the other?
I guess you can't use record notation when using tuples, but that's just a convenience problem. Another thing might be the notion of order in tuples, as opposed to product types, but I think that's just due to the naming of the functions fst and snd.
#chi's answer is about the technical differences in terms of Haskell's evaluation model. I hope to give you some insight into the philosophy of this sort of typed programming.
In category theory we generally work with objects "up to isomorphism". Your Bar is of course isomorphic to (String, Int, Char), so from a categorical perspective they're the same thing.
bar_tuple :: Iso' Bar (String, Int, Char)
bar_tuple = iso to from
where to (Bar s i c) = (s, i, c)
from (s, i, c) = Bar s i c
In some sense tuples are a Platonic form of product type, in that they have no meaning beyond being a collection of disparate values. All the other product types can be mapped to and from a plain old tuple.
So why not use tuples everywhere, when all Haskell types ultimately boil down to a sum of products? It's about communication. As Martin Fowler says,
Any fool can write code that a computer can understand. Good programmers write code that humans can understand.
Names are important! Writing down a custom product type like
data Customer = Customer { name :: String, address :: String }
imbues the type Customer with meaning to the person reading the code, unlike (String, String) which just means "two strings".
Custom types are particularly useful when you want to enforce invariants by hiding the representation of your data and using smart constructors:
newtype NonEmpty a = NonEmpty [a]
nonEmpty :: [a] -> Maybe (NonEmpty a)
nonEmpty [] = Nothing
nonEmpty xs = Just (NonEmpty xs)
Now, if you don't export the NonEmpty constructor, you can force people to go through the nonEmpty smart constructor. If someone hands you a NonEmpty value you may safely assume that it has at least one element.
You can of course represent Customer as a tuple under the hood and expose evocatively-named field accessors,
newtype Customer = Bar (String, String)
name, address :: Customer -> String
name (Customer (n, a)) = n
address (Customer (n, a)) = a
but this doesn't really buy you much, except that it's now cheaper to convert Customer to a tuple (if, say, you're writing performance-sensitive code that works with a tuple-oriented API).
If your code is intended to solve a particular problem - which of course is the whole point of writing code - it pays to not just solve the problem, but make it look like you've solved it too. Someone - maybe you in a couple of years - is going to have to read this code and understand it with no a priori knowledge of how it works. Custom types are a very important communication tool in this regard.
The type
data Foo = Foo (String, Int, Char)
represents a double-lifted tuple. It values comprise
undefined
Foo undefined
Foo (undefined, undefined, undefined)
etc.
This is usually troublesome. Because of this, it's rare to see such definitions in actual code. We either have plain data types
data Foo = Foo String Int Char
or newtypes
newtype Foo = Foo (String, Int, Char)
The newtype can be just as inconvenient to use, but at least it
does not double-lift the tuple: undefined and Foo undefined are now equal values.
The newtype also provides zero-cost conversion between a plain tuple and Foo, in both directions.
You can see such newtypes in use e.g. when the programmer needs a different instance for some type class, than the one already associated with the tuple. Or, perhaps, it is used in a "smart constructor" idiom.
I would not expect the pattern used in Foo to be frequent. There is slight difference in what the constructor acts like: Foo :: (String, Int, Char) -> Foo as opposed to Bar :: String -> Int -> Char -> Bar. Then Foo undefined and Foo (undefined, ..., ...) are strictly speaking different things, whereas you miss one level of undefinedness in Bar.

Why doesn't GHC Haskell support overloaded record parameter names?

What I am talking about is that it is not possible to define:
data A = A {name :: String}
data B = B {name :: String}
I know that the GHC just desugars this to plain functions and the idiomatic way to solve this would be:
data A = A {aName :: String}
data B = B {bName :: String}
class Name a where
name :: a -> String
instance Name A where
name = aName
instance Name B where
name = bName
After having written this out I don't like it that much ... couldn't this typeclassing be part of the desugaring process?
The thought came to me when I was writing some Aeson JSON parsing. Where it would have been too easy to just derive the FromJSON instances for every data type I had to write everything out by hand (currently >1k lines and counting).
Having names like name or simply value in a data record is not that uncommon.
http://www.haskell.org/haskellwiki/Performance/Overloading mentions that function overloading introduces some runtime overhead. But I actually don't see why the compiler wouldn't be able to resolve this at compile time and give them different names internally.
This SO question from 2012 more or less states historical reasons and points to a mail thread from 2006. Has anything changed recently?
Even if there would be some runtime overhead most people wouldn't mind cause most code hardly is performance critical.
Is there some hidden language extension that actually allows this? Again I am not sure ... but I think Idris actually does this?
Many, mostly minor reasons. One is the problem raised by a better answer, overloading just on the first argument is insufficient to handle all the useful cases.
You could "desugar"
data A { name :: String }
data B { name :: Text }
into
class Has'name a b | a -> b where
name :: a -> b
data A { aName :: String }
instance Has'name A String where
name :: aName
data B { bName :: Text }
instance Has'name B Text where
name :: bName
but that would require GHC extensions (Functional Dependencies) that haven't made it into the standard, yet. It would preclude using just 'name' for record creation, updates, and pattern matching (view patterns might help there), since 'name' isn't "just" a function in those cases. You can probably pull off something very similar with template Haskell.
Using the record syntax
data A { name :: String }
implicitly defines a function
name :: A -> String
If define both A and B with a { name :: String }, we have conflicting type definitions for name:
name :: A -> String
name :: B -> String
It's not clear how your proposed implicit type classes would work because if we define two types
data A { name :: String }
data B { name :: Text }
then we have just shifted the problem to conflicting type class definitions:
class Has'name a where
name :: a -> String
class Has'name a where
name :: a -> Text
In principle this could be resolved one way or another, but this is just one of several tricky conflicting desirable properties for records. When Haskell was defined, it was decided that it was better to have simple if limited support rather than to try to design something more ambitious and complicated. Several improvements to records have been discussed at various times and there are perennial discussions, e.g. this Haskell Cafe thread. Perhaps something will be worked out for Haskell Prime.
The best way I found, is to use a preprocessor to solve this definitely rather stupid problem.
Haskell and GHC make this easy, because the whole Haskell parser is available as a normal library. You could just parse all the files, do that renaming scheme (e.g. « data A { name :: String } » and « let a = A "Betty" in name a » into « data A { a_Name :: String } » and « let a = A "Betty" in aName a ») depending on the type of data the name function is applied to, using the type resolver, and write them out for compilation.
But honestly, that should be integrated into GHC. You’re right: It’s silly that this isn’t included.

Haskell "dependent" fields of a record?

I've got the following record defined:
data Option = Option {
a :: Maybe String,
b :: Either String Int
} deriving (Show)
Is there anyway for me to enforce that when a is Nothing, b must be a Left and when a is Just, b must be a Right? Maybe with phantom types, or something else? Or must I wrap the whole thing inside of an Either and make it Either String (String, Int) ?
You should just use two constructors for the two possible shapes:
data Option = NoA String | WithA String Int
Of course, you should give them better names, based on what they represent. Phantom types are definitely overkill here, and I would suggest avoiding Either — Left and Right are not very self-documenting constructor names.
If it makes sense to interpret both Either branches of the b field as representing the same data, then you should define a function that reflects this interpretation:
b :: Option -> MeaningOfB
b (NoA s) = ...
b (WithA t n) = ...
If you have fields that stay the same no matter what the choice, you should make a new data type with all of them, and include it in both constructors. If you make each constructor a record, you can give the common field the same name in every constructor, so that you can extract it from any Option value without having to pattern-match on it.
Basically, think about what it means for the string not to be present: what does it change about the other fields, and what stays the same? Whatever changes should go in the respective constructors; whatever stays the same should be factored out into its own type. (This is a good design principle in general!)
If you come from an OOP background, you can think about this in terms of reasoning with composition instead of inheritance — but try not to take the analogy too far.

How to define a class that allows uniform access to different records in Haskell?

I have two records that both have a field I want to extract for display. How do I arrange things so they can be manipulated with the same functions? Since they have different fields (in this case firstName and buildingName) that are their name fields, they each need some "adapter" code to map firstName to name. Here is what I have so far:
class Nameable a where
name :: a -> String
data Human = Human {
firstName :: String
}
data Building = Building {
buildingName :: String
}
instance Nameable Human where
name x = firstName x
instance Nameable Building where
-- I think the x is redundant here, i.e the following should work:
-- name = buildingName
name x = buildingName x
main :: IO ()
main = do
putStr $ show (map name items)
where
items :: (Nameable a) => [a]
items = [ Human{firstName = "Don"}
-- Ideally I want the next line in the array too, but that gives an
-- obvious type error at the moment.
--, Building{buildingName = "Empire State"}
]
This does not compile:
TypeTest.hs:23:14:
Couldn't match expected type `a' against inferred type `Human'
`a' is a rigid type variable bound by
the type signature for `items' at TypeTest.hs:22:23
In the expression: Human {firstName = "Don"}
In the expression: [Human {firstName = "Don"}]
In the definition of `items': items = [Human {firstName = "Don"}]
I would have expected the instance Nameable Human section would make this work. Can someone explain what I am doing wrong, and for bonus points what "concept" I am trying to get working, since I'm having trouble knowing what to search for.
This question feels similar, but I couldn't figure out the connection with my problem.
Consider the type of items:
items :: (Nameable a) => [a]
It's saying that for any Nameable type, items will give me a list of that type. It does not say that items is a list that may contain different Nameable types, as you might think. You want something like items :: [exists a. Nameable a => a], except that you'll need to introduce a wrapper type and use forall instead. (See: Existential type)
{-# LANGUAGE ExistentialQuantification #-}
data SomeNameable = forall a. Nameable a => SomeNameable a
[...]
items :: [SomeNameable]
items = [ SomeNameable $ Human {firstName = "Don"},
SomeNameable $ Building {buildingName = "Empire State"} ]
The quantifier in the data constructor of SomeNameable basically allows it to forget everything about exactly which a is used, except that it is Nameable. Therefore, you will only be allowed to use functions from the Nameable class on the elements.
To make this nicer to use, you can make an instance for the wrapper:
instance Nameable (SomeNameable a) where
name (SomeNameable x) = name x
Now you can use it like this:
Main> map name items
["Don", "Empire State"]
Everybody is reaching for either existential quantification or algebraic data types. But these are both overkill (well depending on your needs, ADTs might not be).
The first thing to note is that Haskell has no downcasting. That is, if you use the following existential:
data SomeNameable = forall a. Nameable a => SomeNameable a
then when you create an object
foo :: SomeNameable
foo = SomeNameable $ Human { firstName = "John" }
the information about which concrete type the object was made with (here Human) is forever lost. The only things we know are: it is some type a, and there is a Nameable a instance.
What is it possible to do with such a pair? Well, you can get the name of the a you have, and... that's it. That's all there is to it. In fact, there is an isomorphism. I will make a new data type so you can see how this isomorphism arises in cases when all your concrete objects have more structure than the class.
data ProtoNameable = ProtoNameable {
-- one field for each typeclass method
protoName :: String
}
instance Nameable ProtoNameable where
name = protoName
toProto :: SomeNameable -> ProtoNameable
toProto (SomeNameable x) = ProtoNameable { protoName = name x }
fromProto :: ProtoNameable -> SomeNameable
fromProto = SomeNameable
As we can see, this fancy existential type SomeNameable has the same structure and information as ProtoNameable, which is isomorphic to String, so when you are using this lofty concept SomeNameable, you're really just saying String in a convoluted way. So why not just say String?
Your items definition has exactly the same information as this definition:
items = [ "Don", "Empire State" ]
I should add a few notes about this "protoization": it is only as straightforward as this when the typeclass you are existentially quantifying over has a certain structure: namely when it looks like an OO class.
class Foo a where
method1 :: ... -> a -> ...
method2 :: ... -> a -> ...
...
That is, each method only uses a once as an argument. If you have something like Num
class Num a where
(+) :: a -> a -> a
...
which uses a in multiple argument positions, or as a result, then eliminating the existential is not as easy, but still possible. However my recommendation to do this changes from a frustration to a subtle context-dependent choice, because of the complexity and distant relationship of the two representations. However, every time I have seen existentials used in practice it is with the Foo kind of tyepclass, where it only adds needless complexity, so I quite emphatically consider it an antipattern. In most of these cases I recommend eliminating the entire class from your codebase and exclusively using the protoized type (after you give it a good name).
Also, if you do need to downcast, then existentials aren't your man. You can either use an algebraic data type, as others people have answered, or you can use Data.Dynamic (which is basically an existential over Typeable. But don't do that; a Haskell programmer resorting to Dynamic is ungentlemanlike. An ADT is the way to go, where you characterize all the possible types it could be in one place (which is necessary so that the functions that do the "downcasting" know that they handle all possible cases).
I like #hammar's answer, and you should also check out this article which provides another example.
But, you might want to think differently about your types. The boxing of Nameable into the SomeNameable data type usually makes me start thinking about whether a union type for the specific case is meaningful.
data Entity = H Human | B Building
instance Nameable Entity where ...
items = [H (Human "Don"), B (Building "Town Hall")]
I'm not sure why you want to use the same function for
getting the name of a Human and the name of a Building.
If their names are used in fundamentally different ways,
except maybe for simple things like printing them,
then you probably want two
different functions for that. The type system
will automatically guide you to choose the right function
to use in each situation.
But if having a name is something significant about the
whole purpose of your program, and a Human and a Building
are really pretty much the same thing in that respect as far as your program
is concerned, then you would define their type together:
data NameableThing =
Human { name :: String } |
Building { name :: String }
That gives you a polymorphic function name that works for
whatever particular flavor of NameableThing you happen to have,
without needing to get into type classes.
Usually you would use a type class for a different kind of situation:
if you have some kind of non-trivial operation that has the same purpose
but a different implementation for several different types.
Even then, it's often better to use some other approach instead, like
passing a function as a parameter (a "higher order function", or "HOF").
Haskell type classes are a beautiful and powerful tool, but they are totally
different than what is called a "class" in object-oriented languages,
and they are used far less often.
And I certainly don't recommend complicating your program by using an advanced
extension to Haskell like Existential Qualification just to fit into
an object-oriented design pattern.
You can try to use Existentially Quanitified types and do it like this:
data T = forall a. Nameable a => MkT a
items = [MkT (Human "bla"), MkT (Building "bla")]
I've just had a look at the code that this question is abstracting from. For this, I would recommend merging the Task and RecurringTaskDefinition types:
data Task
= Once
{ name :: String
, scheduled :: Maybe Day
, category :: TaskCategory
}
| Recurring
{ name :: String
, nextOccurrence :: Day
, frequency :: RecurFrequency
}
type ProgramData = [Task] -- don't even need a new data type for this any more
Then, the name function works just fine on either type, and the functions you were complaining about like deleteTask and deleteRecurring don't even need to exist -- you can just use the standard delete function as usual.

Resources