Type Matching in Haskell - haskell

If SomeType is defined as:
data SomeType = X {myBool :: Bool}
| Y {myString :: String}
| Z {myString :: String}
and I will update an arbitrary X, dependent of his type as follows:
changeST :: SomeType -> SomeType
changeST (X b) = (X True)
changeST (Y s) = (Y "newString")
changeST (Z s) = (Z "newString")
The third and the fourth line do the very same, they update the string in the given type.
Is there any way replace these two lines by a single one, eg. by assigning the type to a variable?

Not by assigning the type to a variable, but by doing field replacement:
changeST :: SomeType -> SomeType
changeST (X b) = (X True)
changeST st = st { myString = "newString" }
This returns the same st as its argument, but with the value of the myString field replaced. It's one of the nice features of fields that you can do this without caring which data constructor it is, as long as it's one of the data constructors that uses myString.

You can use Scrap-Your-Boilerplate for this.
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Generics
data SomeType
= X { myBool :: Bool }
| Y { myString :: String }
| Z { myString :: String }
deriving (Data, Typeable)
changeST :: SomeType -> SomeType
changeST = everywhere (mkT (const True)) . everywhere (mkT (const "newString"))
This changeST changes every internal String in your structure to "newString" and every Bool to True.

I prefer Dan's solution, but pattern guards in GHC (standard in Haskell 2010) are a neat alternative to Michael's proposal:
{-# LANGUAGE PatternGuards #-}
changeST :: SomeType -> SomeType
changeST x | X _ <- x = X True
| Y _ <- x = Y newString
| Z _ <- x = Z newString
where newString = "newString"

Your three definitions of changeST are separate from each other, so the short answer is "no". There are, however, at least two ways you can do this.
Pattern match both the Y and Z constructors at once:
You can combine the 2nd and 3rd definition by making your pattern matching more general:
changeST x = x { myString = "newString"}
This creates a new version of x, whether it be a Y or a Z, replacing the string member. You have to be careful when doing this, though. If you later rename the string field of Z, for example, you will get runtime pattern match failures when calling changeST with a Z argument.
Use a case expression:
If you combine your three definitions into one, you can share data between them.
changeST :: SomeType -> SomeType
changeST x = case x of
X _ -> X True
Y _ -> Y newString
Z _ -> Z newString
where
newString = "newString"

Related

How to reduce code duplication when dealing with recursive sum types

I am currently working on a simple interpreter for a programming language and I have a data type like this:
data Expr
= Variable String
| Number Int
| Add [Expr]
| Sub Expr Expr
And I have many functions that do simple things like:
-- Substitute a value for a variable
substituteName :: String -> Int -> Expr -> Expr
substituteName name newValue = go
where
go (Variable x)
| x == name = Number newValue
go (Add xs) =
Add $ map go xs
go (Sub x y) =
Sub (go x) (go y)
go other = other
-- Replace subtraction with a constant with addition by a negative number
replaceSubWithAdd :: Expr -> Expr
replaceSubWithAdd = go
where
go (Sub x (Number y)) =
Add [go x, Number (-y)]
go (Add xs) =
Add $ map go xs
go (Sub x y) =
Sub (go x) (go y)
go other = other
But in each of these functions, I have to repeat the part that calls the code recursively with just a small change to one part of the function. Is there any existing way to do this more generically? I would rather not have to copy and paste this part:
go (Add xs) =
Add $ map go xs
go (Sub x y) =
Sub (go x) (go y)
go other = other
And just change a single case each time because it seems inefficient to duplicate code like this.
The only solution I could come up with is to have a function that calls a function first on the whole data structure and then recursively on the result like this:
recurseAfter :: (Expr -> Expr) -> Expr -> Expr
recurseAfter f x =
case f x of
Add xs ->
Add $ map (recurseAfter f) xs
Sub x y ->
Sub (recurseAfter f x) (recurseAfter f y)
other -> other
substituteName :: String -> Int -> Expr -> Expr
substituteName name newValue =
recurseAfter $ \case
Variable x
| x == name -> Number newValue
other -> other
replaceSubWithAdd :: Expr -> Expr
replaceSubWithAdd =
recurseAfter $ \case
Sub x (Number y) ->
Add [x, Number (-y)]
other -> other
But I feel like there should probably be a simpler way to do this already. Am I missing something?
Congratulations, you just rediscovered anamorphisms!
Here's your code, rephrased so that it works with the recursion-schemes package. Alas, it's not shorter, since we need some boilerplate to make the machinery work. (There might be some automagic way to avoid the boilerplate, e.g. using generics. I simply do not know.)
Below, your recurseAfter is replaced with the standard ana.
We first define your recursive type, as well as the functor it is the fixed point of.
{-# LANGUAGE DeriveFunctor, TypeFamilies, LambdaCase #-}
{-# OPTIONS -Wall #-}
module AnaExpr where
import Data.Functor.Foldable
data Expr
= Variable String
| Number Int
| Add [Expr]
| Sub Expr Expr
deriving (Show)
data ExprF a
= VariableF String
| NumberF Int
| AddF [a]
| SubF a a
deriving (Functor)
Then we connect the two with a few instances so that we can unfold Expr into the isomorphic ExprF Expr, and fold it back.
type instance Base Expr = ExprF
instance Recursive Expr where
project (Variable s) = VariableF s
project (Number i) = NumberF i
project (Add es) = AddF es
project (Sub e1 e2) = SubF e1 e2
instance Corecursive Expr where
embed (VariableF s) = Variable s
embed (NumberF i) = Number i
embed (AddF es) = Add es
embed (SubF e1 e2) = Sub e1 e2
Finally, we adapt your original code, and add a couple of tests.
substituteName :: String -> Int -> Expr -> Expr
substituteName name newValue = ana $ \case
Variable x | x == name -> NumberF newValue
other -> project other
testSub :: Expr
testSub = substituteName "x" 42 (Add [Add [Variable "x"], Number 0])
replaceSubWithAdd :: Expr -> Expr
replaceSubWithAdd = ana $ \case
Sub x (Number y) -> AddF [x, Number (-y)]
other -> project other
testReplace :: Expr
testReplace = replaceSubWithAdd
(Add [Sub (Add [Variable "x", Sub (Variable "y") (Number 34)]) (Number 10), Number 4])
An alternative could be to define ExprF a only, and then derive type Expr = Fix ExprF. This saves some of the boilerplate above (e.g. the two instances), at the cost of having to use Fix (VariableF ...) instead of Variable ..., as well as the analogous for the other constructors.
One could further alleviate that using pattern synonyms (at the cost of a little more boilerplate, though).
Update: I finally found the automagic tool, using template Haskell. This makes the whole code reasonably short. Note that the ExprF functor and the two instances above still exist under the hood, and we still have to use them. We only save the hassle of having to define them manually, but that alone saves a lot of effort.
{-# LANGUAGE DeriveFunctor, DeriveTraversable, TypeFamilies, LambdaCase, TemplateHaskell #-}
{-# OPTIONS -Wall #-}
module AnaExpr where
import Data.Functor.Foldable
import Data.Functor.Foldable.TH
data Expr
= Variable String
| Number Int
| Add [Expr]
| Sub Expr Expr
deriving (Show)
makeBaseFunctor ''Expr
substituteName :: String -> Int -> Expr -> Expr
substituteName name newValue = ana $ \case
Variable x | x == name -> NumberF newValue
other -> project other
testSub :: Expr
testSub = substituteName "x" 42 (Add [Add [Variable "x"], Number 0])
replaceSubWithAdd :: Expr -> Expr
replaceSubWithAdd = ana $ \case
Sub x (Number y) -> AddF [x, Number (-y)]
other -> project other
testReplace :: Expr
testReplace = replaceSubWithAdd
(Add [Sub (Add [Variable "x", Sub (Variable "y") (Number 34)]) (Number 10), Number 4])
As an alternative approach, this is also a typical use case for the uniplate package. It can use Data.Data generics rather than Template Haskell to generate the boilerplate, so if you derive Data instances for your Expr:
import Data.Data
data Expr
= Variable String
| Number Int
| Add [Expr]
| Sub Expr Expr
deriving (Show, Data)
then the transform function from Data.Generics.Uniplate.Data applies a function recursively to each nested Expr:
import Data.Generics.Uniplate.Data
substituteName :: String -> Int -> Expr -> Expr
substituteName name newValue = transform f
where f (Variable x) | x == name = Number newValue
f other = other
replaceSubWithAdd :: Expr -> Expr
replaceSubWithAdd = transform f
where f (Sub x (Number y)) = Add [x, Number (-y)]
f other = other
Note that in replaceSubWithAdd in particular, the function f is written to perform a non-recursive substitution; transform makes it recursive in x :: Expr, so it's doing the same magic to the helper function as ana does in #chi's answer:
> substituteName "x" 42 (Add [Add [Variable "x"], Number 0])
Add [Add [Number 42],Number 0]
> replaceSubWithAdd (Add [Sub (Add [Variable "x",
Sub (Variable "y") (Number 34)]) (Number 10), Number 4])
Add [Add [Add [Variable "x",Add [Variable "y",Number (-34)]],Number (-10)],Number 4]
>
This is no shorter than #chi's Template Haskell solution. One potential advantage is that uniplate provides some additional functions that may be helpful. For example, if you use descend in place of transform, it transforms only the immediate children which can give you control over where the recursion happens, or you can use rewrite to re-transform the result of transformations until you reach a fixed point. One potential disadvantage is that "anamorphism" sounds way cooler than "uniplate".
Full program:
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Data -- in base
import Data.Generics.Uniplate.Data -- package uniplate
data Expr
= Variable String
| Number Int
| Add [Expr]
| Sub Expr Expr
deriving (Show, Data)
substituteName :: String -> Int -> Expr -> Expr
substituteName name newValue = transform f
where f (Variable x) | x == name = Number newValue
f other = other
replaceSubWithAdd :: Expr -> Expr
replaceSubWithAdd = transform f
where f (Sub x (Number y)) = Add [x, Number (-y)]
f other = other
replaceSubWithAdd1 :: Expr -> Expr
replaceSubWithAdd1 = descend f
where f (Sub x (Number y)) = Add [x, Number (-y)]
f other = other
main = do
print $ substituteName "x" 42 (Add [Add [Variable "x"], Number 0])
print $ replaceSubWithAdd e
print $ replaceSubWithAdd1 e
where e = Add [Sub (Add [Variable "x", Sub (Variable "y") (Number 34)])
(Number 10), Number 4]

Cleaner Alternative to Extensive Pattern Matching in Haskell

Right now, I have some code that essentially works like this:
data Expression
= Literal Bool
| Variable String
| Not Expression
| Or Expression Expression
| And Expression Expression
deriving Eq
simplify :: Expression -> Expression
simplify (Literal b) = Literal b
simplify (Variable s) = Variable s
simplify (Not e) = case simplify e of
(Literal b) -> Literal (not b)
e' -> Not e'
simplify (And a b) = case (simplify a, simplify b) of
(Literal False, _) -> Literal False
(_, Literal False) -> Literal False
(a', b') -> And a' b'
simplify (Or a b) = case (simplify a, simplify b) of
(Literal True, _) -> Literal True
(_, Literal True) -> Literal True
(a', b') -> Or a' b'
And many more such patterns regarding all the ways one can simplify a boolean expression. As I add more operators and rules however, this grows immensely and feels.. clunky. Especially so since some rules need to be added twice to account for commutativity.
How can I nicely refactor lots and lots of patterns of which some (most, I'd say) are even symmetrical (take the And and Or patterns for example)?
Right now, adding a rule to simplify And (Variable "x") (Not (Variable "x")) to Literal False requires me to add two nested rules, which is all but optimal.
Basically the problem is that you have to write out simplify of the subexpressions in each clause, over and over again. It would be better to first get all the subexpressions done before even considering laws involving the top-level operator. One simple way is to add an auxiliary version of simplify, that doesn't recurse down:
simplify :: Expression -> Expression
simplify (Literal b) = Literal b
simplify (Variable s) = Variable s
simplify (Not e) = simplify' . Not $ simplify e
simplify (And a b) = simplify' $ And (simplify a) (simplify b)
simplify (Or a b) = simplify' $ Or (simplify a) (simplify b)
simplify' :: Expression -> Expression
simplify' (Not (Literal b)) = Literal $ not b
simplify' (And (Literal False) _) = Literal False
...
With the only small amount of operations you have in booleans, this is probably the most sensible approach. However with more operations, the duplication in simplify might still be worth to avoid. To that end, you can conflate the unary and binary operations to a common constructor:
data Expression
= Literal Bool
| Variable String
| BoolPrefix BoolPrefix Expression
| BoolInfix BoolInfix Expression Expression
deriving Eq
data BoolPrefix = Negation
data BoolInfix = AndOp | OrOp
and then you have just
simplify (Literal b) = Literal b
simplify (Variable s) = Variable s
simplify (BoolPrefix bpf e) = simplify' . BoolPrefix bpf $ simplify e
simplify (BoolInfix bifx a b) = simplify' $ BoolInfix bifx (simplify a) (simplify b)
Obviously this makes simplify' more awkward though, so perhaps not such a good idea. You can however get around this syntactical overhead by defining specialised pattern synonyms:
{-# LANGUAGE PatternSynonyms #-}
pattern Not :: Expression -> Expression
pattern Not x = BoolPrefix Negation x
infixr 3 :∧
pattern (:∧) :: Expression -> Expression -> Expression
pattern a:∧b = BoolInfix AndOp a b
infixr 2 :∨
pattern (:∨) :: Expression -> Expression -> Expression
pattern a:∨b = BoolInfix OrOp a b
For that matter, perhaps also
pattern F, T :: Expression
pattern F = Literal False
pattern T = Literal True
With that, you can then write
simplify' :: Expression -> Expression
simplify' (Not (Literal b)) = Literal $ not b
simplify' (F :∧ _) = F
simplify' (_ :∧ F) = F
simplify' (T :∨ _) = T
simplify' (a :∧ Not b) | a==b = T
...
I should add a caveat though: when I tried something similar to those pattern synonyms, not for booleans but affine mappings, it made the compiler extremely slow. (Also, GHC-7.10 didn't yet support polymorphic pattern synonyms yet; this has changed quite a bit as of now.)
Note also that all this will not generally yield the simplest possible form –
for that, you'd need to find the fixed point of simplify.
One thing you can do is simplify as you construct, rather than first constructing then repeatedly destructing. So:
module Simple (Expression, true, false, var, not, or, and) where
import Prelude hiding (not, or, and)
data Expression
= Literal Bool
| Variable String
| Not Expression
| Or Expression Expression
| And Expression Expression
deriving (Eq, Ord, Read, Show)
true = Literal True
false = Literal False
var = Variable
not (Literal True) = false
not (Literal False) = true
not x = Not x
or (Literal True) _ = true
or _ (Literal True) = true
or x y = Or x y
and (Literal False) _ = false
and _ (Literal False) = false
and x y = And x y
We can try it out in ghci:
> and (var "x") (and (var "y") false)
Literal False
Note that the constructors are not exported: this ensures that folks constructing one of these can't avoid the simplification process. Actually, this may be a drawback; occasionally it is nice to see the "full" form. A standard approach to dealing with this is to make the exported smart constructors part of a type-class; you can either use them to build a "full" form or a "simplified" way. To avoid having to define the type twice, we could either use a newtype or a phantom parameter; I'll elect for the latter here to reduce the noise in pattern-matching.
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE KindSignatures #-}
module Simple (Format(..), true, false, var, not, or, and) where
import Prelude hiding (not, or, and)
data Format = Explicit | Simplified
data Expression (a :: Format)
= Literal Bool
| Variable String
| Not (Expression a)
| Or (Expression a) (Expression a)
| And (Expression a) (Expression a)
deriving (Eq, Ord, Read, Show)
class Expr e where
true, false :: e
var :: String -> e
not :: e -> e
or, and :: e -> e -> e
instance Expr (Expression Explicit) where
true = Literal True
false = Literal False
var = Variable
not = Not
or = Or
and = And
instance Expr (Expression Simplified) where
true = Literal True
false = Literal False
var = Variable
not (Literal True) = false
not (Literal False) = true
not x = Not x
or (Literal True) _ = true
or _ (Literal True) = true
or x y = Or x y
and (Literal False) _ = false
and _ (Literal False) = false
and x y = And x y
Now in ghci we can "run" the same term in two different ways:
> :set -XDataKinds
> and (var "x") (and (var "y") false) :: Expression Explicit
And (Variable "x") (And (Variable "y") (Literal False))
> and (var "x") (and (var "y") false) :: Expression Simplified
Literal False
You might want to add more rules later; for example, maybe you want:
and (Variable x) (Not (Variable y)) | x == y = false
and (Not (Variable x)) (Variable y) | x == y = false
Having to repeat both "orders" of patterns is a bit annoying. We should abstract over that! The data declaration and classes will be the same, but we'll add the helper function eitherOrder and use it in the definitions of and and or. Here's a more complete set of simplifications using this idea (and our final version of the module):
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE KindSignatures #-}
module Simple (Format(..), true, false, var, not, or, and) where
import Data.Maybe
import Data.Monoid
import Prelude hiding (not, or, and)
import Control.Applicative ((<|>))
data Format = Explicit | Simplified
data Expression (a :: Format)
= Literal Bool
| Variable String
| Not (Expression a)
| Or (Expression a) (Expression a)
| And (Expression a) (Expression a)
deriving (Eq, Ord, Read, Show)
class Expr e where
true, false :: e
var :: String -> e
not :: e -> e
or, and :: e -> e -> e
instance Expr (Expression Explicit) where
true = Literal True
false = Literal False
var = Variable
not = Not
or = Or
and = And
eitherOrder :: (e -> e -> e)
-> (e -> e -> Maybe e)
-> e -> e -> e
eitherOrder fExplicit fSimplified x y = fromMaybe
(fExplicit x y)
(fSimplified x y <|> fSimplified y x)
instance Expr (Expression Simplified) where
true = Literal True
false = Literal False
var = Variable
not (Literal True) = false
not (Literal False) = true
not (Not x) = x
not x = Not x
or = eitherOrder Or go where
go (Literal True) _ = Just true
go (Literal False) x = Just x
go (Variable x) (Variable y) | x == y = Just (var x)
go (Variable x) (Not (Variable y)) | x == y = Just true
go _ _ = Nothing
and = eitherOrder And go where
go (Literal True) x = Just x
go (Literal False) _ = Just false
go (Variable x) (Variable y) | x == y = Just (var x)
go (Variable x) (Not (Variable y)) | x == y = Just false
go _ _ = Nothing
Now in ghci we can do more complicated simplifications, like:
> and (not (not (var "x"))) (var "x") :: Expression Simplified
Variable "x"
And even though we only wrote one order of the rewrite rule, both orders work properly:
> and (not (var "x")) (var "x") :: Expression Simplified
Literal False
> and (var "x") (not (var "x")) :: Expression Simplified
Literal False
I think Einstein said, "Simplify as much as possible, but no more." You have yourself a complicated datatype, and a correspondingly complicated concept, so I assume any technique can only be so much cleaner for the problem at hand.
That said, the first option is to use instead a case structure.
simplify x = case x of
Literal _ -> x
Variable _ -> x
Not e -> simplifyNot $ simplify e
...
where
sharedFunc1 = ...
sharedFunc2 = ...
This has the added benefit of including shared functions which will be usable by all cases but not at the top level namespace. I also like how the cases are freed of their parenthesis. (Also note that in the first two cases i just return the original term, not creating a new one). I often use this sort of structure to just break out other simplify functions, as in the case of Not.
This problem in particular may lend itself to basing Expression on an underlying functor, so that you may fmap a simplification of the subexpressions and then perform the specific combinations of the given case. It will look something like the following:
simplify :: Expression' -> Expression'
simplify = Exp . reduce . fmap simplify . unExp
The steps in this are unwrapping Expression' into the underlying functor representation, mapping the simplification on the underlying term, and then reducing that simplification and wrapping back up into the new Expression'
{-# Language DeriveFunctor #-}
newtype Expression' = Exp { unExp :: ExpressionF Expression' }
data ExpressionF e
= Literal Bool
| Variable String
| Not e
| Or e e
| And e e
deriving (Eq,Functor)
Now, I have pushed the complexity off into the reduce function, which is only a little less complex because it doesn't have to worry about first reducing the subterm. But it will now contain solely the business logic of combining one term with another.
This may or may not be a good technique for you, but it may make some enhancements easier. For instance, if it is possible to form invalid expressions in your language, you could simplify that with Maybe valued failures.
simplifyMb :: Expression' -> Maybe Expression'
simplifyMb = fmap Exp . reduceMb <=< traverse simplifyMb . unExp
Here, traverse will apply simplfyMb to the subterms of the ExpressionF, resulting in an expression of Maybe subterms, ExpressionF (Maybe Expression'), and then if any subterms are Nothing, it will return Nothing, if all are Just x, it will return Just (e::ExpressionF Expression'). Traverse isn't actually separated into distinct phases like that, but it's easier to explain as if it were. Also note, you will need language pragmas for DeriveTraversable and DeriveFoldable, as well as deriving statements on the ExpressionF data type.
The downside? Well, for one the dirt of your code will then lie in a bunch of Exp wrappers everywhere. Consider the application of simplfyMb of the simple term below:
simplifyMb (Exp $ Not (Exp $ Literal True))
It's also a lot to get a head around, but if you understand traverse and fmap pattern above, you can reuse it in lots of places, so that's good. I also believe defining simplify in that way makes it more robust to whatever the specific ExpressionF constructions may turn into. It doesn't mention them so the deep simplification will be unaffected by refactors. The reduce function on the other hand will be.
Carrying on with your Binary Op Expression Expression idea, we could have the datatype:
data Expression
= Literal Bool
| Variable String
| Not Expression
| Binary Op Expression Expression
deriving Eq
data Op = Or | And deriving Eq
And an auxiliary function
{-# LANGUAGE ViewPatterns #-}
simplifyBinary :: Op -> Expression -> Expression -> Expression
simplifyBinary binop (simplify -> leftexp) (simplify -> rightexp) =
case oneway binop leftexp rightexp ++ oneway binop rightexp leftexp of
simplified : _ -> simplified
[] -> Binary binop leftexp rightexp
where
oneway :: Op -> Expression -> Expression -> [Expression]
oneway And (Literal False) _ = [Literal False]
oneway Or (Literal True) _ = [Literal True]
-- more cases here
oneway _ _ _ = []
The idea is that you would put the simplification cases in oneway and then simplifyBinary would take care of reversing the arguments, to avoid having to write the symmetric cases.
You could write a generic simplifier for all binary operations:
simplifyBinWith :: (Bool -> Bool -> Bool) -- the boolean operation
-> (Expression -> Expression -> Expression) -- the constructor
-> Expression -> Expression -- the two operands
-> Expression) -- the simplified result
simplifyBinWith op cons a b = case (simplify a, simplify b) of
(Literal x, Literal y) -> Literal (op x y)
(Literal x, b') -> tryAll (x `op`) b'
(a', Literal y) -> tryAll (`op` y) a'
(a', b') -> cons a' b'
where
tryAll f term = case (f True, f False) of -- what would f do if term was true of false
(True, True) -> Literal True
(True, False) -> term
(False, True) -> Not term
(False, False) -> Literal False
That way, your simplify function would become
simplify :: Expression -> Expression
simplify (Not e) = case simplify e of
(Literal b) -> Literal (not b)
e' -> Not e'
simplify (And a b) = simplifyBinWith (&&) And a b
simplify (Or a b) = simplifyBinWith (||) Or a b
simplify t = t
and could be easily extended to more binary operations. It would also work well with the Binary Op Expression Expression idea, you'd pass Op instead of an Expression constructor to simplifyBinWith and the pattern in simplify could be generalised:
simplify :: Expression -> Expression
simplify (Not e) = case simplify e of
(Literal b) -> Literal (not b)
e' -> Not e'
simplify (Binary op a b) = simplifyBinWith (case op of
And -> (&&)
Or -> (||)
Xor -> (/=)
Implies -> (<=)
Equals -> (==)
…
) op a b
simplify t = t
where
simplifyBinWith f op a b = case (simplify a, simplify b) of
(Literal x, Literal y) -> Literal (f x y)
…
(a', b') -> Binary op a' b'

GHC Calling Convention for Sum Type Function Arguments

Does GHC ever unpack sum types when passing them to functions? For example, let's say that we have the following type:
data Foo
= Foo1 {-# UNPACK #-} !Int {-# UNPACK #-} !Word
| Foo2 {-# UNPACK #-} !Int
| Foo3 {-# UNPACK #-} !Word
Then I define a function that is strict in its Foo argument:
consumeFoo :: Foo -> Int
consumeFoo x = case x of ...
At runtime, when I call consumeFoo, what can I expect to happen? The GHC calling convention is to pass arguments in registers (or on the stack once there are too many). I can see two ways that the argument passing could go:
A pointer to a Foo on the heap gets passed in as one argument.
A three-argument representation of Foo is used, one argument representing the data constructor that was used and the other two representing the possible Int and Word values in the data constructor.
I would prefer the second representation, but I don't know if it is actually what happens. I am aware of UnpackedSumTypes landing in GHC 8.2, but it's unclear if it does what I want. If I had instead written the function as:
consumeFooAlt :: (# (# Int#, Word# #) | Int# | Word# #) -> Int
Then I would expect that evaluation (2) would be what happens. And the Unpacking section of the unpacked sums page indicates that I could do this as well:
data Wrap = Wrap {-# UNPACK #-} !Foo
consumeFooAlt2 :: Wrap -> Int
And that should also have the representation I want, I think.
So my question is, without using a wrapper type or a raw unpacked sum, how can I guarentee that a sum is unpacked into registers (or onto the stack) when I pass it as an argument to a function? If it is possible, is it something that GHC 8.0 can already do, or is it something that will only be available in GHC 8.2?
First: Guaranteed optimization and GHC don't mix well. Due to the high level it is very hard to predict the code that GHC will generate in every case. The only way to be sure is to look at the Core. If you're developing an extremely performance dependent application with GHC, then you need to become familar with Core I.
I am not aware of any optimization in GHC that does exactly what you describe. Here is an example program:
module Test where
data Sum = A {-# UNPACK #-} !Int | B {-# UNPACK #-} !Int
consumeSum :: Sum -> Int
consumeSum x = case x of
A y -> y + 1
B y -> y + 2
{-# NOINLINE consumeSumNoinline #-}
consumeSumNoinline = consumeSum
{-# INLINE produceSumInline #-}
produceSumInline :: Int -> Sum
produceSumInline x = if x == 0 then A x else B x
{-# NOINLINE produceSumNoinline #-}
produceSumNoinline :: Int -> Sum
produceSumNoinline x = if x == 0 then A x else B x
test :: Int -> Int
--test x = consumeSum (produceSumInline x)
test x = consumeSumNoinline (produceSumNoinline x)
Let's first look at what happens if we don't inline consumeSum nor produceSum. Here is the core:
test :: Int -> Int
test = \ (x :: Int) -> consumeSumNoinline (produceSumNoinline x)
(produced with ghc-core test.hs -- -dsuppress-unfoldings -dsuppress-idinfo -dsuppress-module-prefixes -dsuppress-uniques)
Here, we can see that GHC (8.0 in this case) does not unbox the sum type passed as a function argument. Nothing changes if we inline either consumeSum or produceSum.
If we inline both however, then the following code is generated:
test :: Int -> Int
test =
\ (x :: Int) ->
case x of _ { I# x1 ->
case x1 of wild1 {
__DEFAULT -> I# (+# wild1 2#);
0# -> lvl1
}
}
What happened here is that through inlining, GHC ends up with:
\x -> case (if x == 0 then A x else B x) of
A y -> y + 1
B y -> y + 2
Which through the case-of-case (if is just a special case) is turned into:
\x -> if x == 0 then case (A x) of ... else case (B x) of ...
Now that is a case with a known constructor, so GHC can reduce the case at compile time ending up with:
\x -> if x == 0 then x + 1 else x + 2
So it completely eliminated the constructor.
In summary, I believe that GHC does not have any concept of an "unboxed sum" type prior to version 8.2, which also applies to function arguments. The only way to get "unboxed" sums is to get the constructor eliminated completely through inlining.
If you need such an optimization, your simplest solution is to do it yourself.
I think there are actually many ways to achieve this, but one is:
data Which = Left | Right | Both
data Foo = Foo Which Int Word
The unpacking of any fields of this type is completely irrelevant to the question of the 'shape of the representation', which is what you are really asking about. Enumerations are already highly optimized - only one value for every constructor is ever created - so the addition of this field doesn't affect performance. The unpacked representation of this type is precisely what you want - one word for Which constructor and one for each field.
If you write your functions in the proper way, you get the proper code:
data Which = Lft | Rgt | Both
data Foo = Foo Which {-# UNPACK #-} !Int {-# UNPACK #-} !Word
consumeFoo :: Foo -> Int
consumeFoo (Foo w l r) =
case w of
Lft -> l
Rgt -> fromIntegral r
Both -> l + fromIntegral r
The generated core is quite obvious:
consumeFoo :: Foo -> Int
consumeFoo =
\ (ds :: Foo) ->
case ds of _ { Foo w dt dt1 ->
case w of _ {
Lft -> I# dt;
Rgt -> I# (word2Int# dt1);
Both -> I# (+# dt (word2Int# dt1))
}
}
However, for simple programs such as:
consumeFoos = foldl' (+) 0 . map consumeFoo
This optimization makes no difference. As is indicated in the other answer, the inner function consumeFoo is just inlined:
Rec {
$wgo :: [Foo] -> Int# -> Int#
$wgo =
\ (w :: [Foo]) (ww :: Int#) ->
case w of _ {
[] -> ww;
: y ys ->
case y of _ {
Lft dt -> $wgo ys (+# ww dt);
Rgt dt -> $wgo ys (+# ww (word2Int# dt));
Both dt dt1 -> $wgo ys (+# ww (+# dt (word2Int# dt1)))
}
}
end Rec }
vs.
Rec {
$wgo :: [Foo] -> Int# -> Int#
$wgo =
\ (w :: [Foo]) (ww :: Int#) ->
case w of _ {
[] -> ww;
: y ys ->
case y of _ { Foo w1 dt dt1 ->
case w1 of _ {
Lft -> $wgo ys (+# ww dt);
Rgt -> $wgo ys (+# ww (word2Int# dt1));
Both -> $wgo ys (+# ww (+# dt (word2Int# dt1)))
}
}
}
end Rec }
Which, in almost every case when working with low-level, unpacked data, is the goal anyways, as most of your functions are small and cost little to inline.

haskell, case of like fmap

I have a function of type foo :: a -> a -> Either String TypeConstructor
foo can return both throwError String and something of TypeConstructor.
I would like to do something like fmap. I mean that I would like to case (foo x y z) of ... where ... means different values (it depends on used constructor value in foo).
Is there exista any way to do it ?
A basic way could be to pattern match everything
case foo x y of
Left string -> ...
Right (Cons1 z w) -> ...
Right (Cons2 a b c) -> ...
Right Cons3 -> ...
where Cons1, ... are the value constructors of TypeConstructor.
You can't directly write
case (foo x y) of ...
and then pattern match on the type constructors of TypeConstructor since foo x y does not have the correct type (it is Either String TypeConstructor).
However, you can define a function which pattern matches on the type constructors of TypeConstructor and then fmap this over the result of foo x y as shown below.
import Control.Monad.Except (throwError)
data Type = Int Int
| Str String
deriving (Show)
foo :: Int -> String -> Either String Type
foo n s =
if n == 0
then throwError "Zero"
else return (Str s)
bar x y = fmap f (foo x y)
where
f a = case a of
Int n -> Int (n + x)
Str s -> Str (s ++ y)

Haskell class instance

It's my first exercise to understand the classes in Haskell. My problem is how to define the functions that I have declared in the class and how to test them by terminal ghci.
I explain step by step what I did:
type Point2d = (Int, Int) -- point
type Vector2d = (Int, Int) -- vector
data Shape =
Line Point2d Point2d
| Triangle Point2d Point2d Point2d
deriving (Eq, Show)
class ClassShape s where
name :: s -> String
perim :: s -> Int -- given a CShape calculates the perimeter
move :: s -> Vector2d -> s
Now, I declare s as ClassShape instance, by implementing the corresponding functions.
nameShape :: Shape s -> String
nameShape Line = "Line"
nameShape Triangle = "Triangle"
perimShape :: Shape s -> Int
perimShape Line a b = 999 -- ...
perimShape Triangle a b c = 999 -- ...
Here's my problem: how should I declare the functions? I just need to see an "example" to understand the concept.
The error that Haskell returns is:
`Shape' is applied to too many type arguments
In the type signature for `nameShape':
nameShape :: Shape s -> String
`Shape' is applied to too many type arguments
In the type signature for `perimShape':
perimShape :: Shape s -> Int
Then, how do I test the program on Haskell?
Thanks to all.
Note that nameShape function will not work, because there is no Shape s type defined. Remember that s is a type variable. Only if you have defined Shape s type constructor you can use them. You have defined Shape type in your definition but not Shape s. For defining instance of typeclass, you have to do something like this:
instance ClassShape Shape where
name (Line _ _) = "Line"
name (Triangle _ _ _) = "Triangle"
perim (Line x y) = undefined -- Calculate perimiter using x and y
perim (Triangle x y z) = undefined
move (Line x y) = undefined
move (Triangle x y z) = undefined
You have to fill the undefined with working parts.
You're making a common confusion of early Haskell programmers: using two different things which work in related ways (sum types and classes) to do the same thing in two different ways. Thus there are two problems: the "little" problem (what does this error mean?) and the "big" problem (why is your code shaped like this?).
The Little Problem
You wrote Shape s when you meant to just write Shape. The way you have defined Shape, it has kind * (that is, it is a concrete type) rather than kind * -> *, which is the kind of adjectives -- things like "a list of" or "a pair of" which are abstract until you give them a concrete type to modify ("a list of strings" is concrete, "a list of" is abstract). When you write Shape s you are applying Shape as an adjective to a type variable s but it's not an adjective; it's a noun.
That is why you get the error:
`Shape' is applied to too many type arguments
Side note: you may be used to languages where the error message usually is not very well-related to the actual problem. In Haskell usually the compiler tells you exactly what is wrong, as it did in this case.
The Big Problem
Type classes are collections of unrelated types which can do the same things. The type class syntax passes an implicit context as a "constraint", this context can be implicit because it belongs to the type.
You may need to read that last paragraph a few times in a quiet corner. Basically I mean to say that you can do the same thing as a type class with a data constructor for the context:
data EqOrd s = EqOrdLib {getEq :: s -> s -> Bool, getCmp :: s -> s -> Ordering}
-- this is just provided to us as a primitive by Haskell
intEOL :: EqOrd Int
intEOL = EqOrdLib (==) compare
-- but we can then define things like this:
listEOL :: EqOrd x -> EqOrd [x]
listEOL (EqOrdLib base_eq base_cmp) = EqOrdLib list_eq list_cmp where
list_cmp [] [] = EQ
list_cmp (_:_) [] = GT
list_cmp [] (_:_) = LT
list_cmp (x:xs) (y:ys) = case base_cmp x y of
LT -> LT
GT -> GT
EQ -> list_cmp xs ys
list_eq xs ys = list_cmp xs ys == EQ
Now to use that sort of context, you would have to write explicitly:
quicksort :: EqOrd x -> [x] -> [x]
quicksort _ [] = []
quicksort lib (p:els) = quicksort lib lesser ++ [p] ++ quicksort lib greater
where cmp = getCmp lib
p_less_than x = cmp x p == LT
p_gte x = not . p_less_than
greater = filter p_less_than els
lesser = filter p_gte els
See, we explicitly pass in this library of functions lib and explicitly pull out the comparison function cmp = getCmp lib.
Type classes allow us to implicitly pass the library of functions, by stating up-front that the type itself only has one such library. We pass the library as a "constraint", so instead of EqOrd x -> [x] -> [x] you write Ord x => [x] -> [x] with the "fat arrow" of constraints. But secretly it means "when you ask me to use the < function on two values of type x, I know implicitly what library to get that function from and will get that function for you."
Now: you have one type, Shape, so you don't need typeclasses. (Go back to the first paragraph above: Type classes are collections of unrelated types which can do the same things.
If you want to do type classes then instead of the sum-type for Shape, let's define n-dimensional vectors of different types:
class Vector v where
(*.) :: (Num r) => r -> v r -> v r
(.+.) :: (Num r) => v r -> v r -> v r
norm :: (Num r, Floating r) => v r -> r
-- another advantage of type classes is *default declarations* like:
(.-.) :: (Num r) => v r -> v r -> v r
v1 .-. v2 = v1 .+. (-1 *. v2)
data V2D r = V2D r r deriving (Eq, Show)
instance Vector V2D where
s *. V2D x y = V2D (s * x) (s * y)
V2D x1 y1 .+. V2D x2 y2 = V2D (x1 + x2) (y1 + y2)
norm (V2D x y) = sqrt (x^2 + y^2)
data V3D r = V3D r r r deriving (Eq, Show)
instance Vector V3D where
s *. V3D x y z = V3D (s * x) (s * y) (s * z)
V3D x1 y1 z1 .+. V3D x2 y2 z2 = V3D (x1 + x2) (y1 + y2) (z1 + z2)
norm (V3D x y z) = sqrt (x^2 + y^2 + z^2)
Then we can write things like:
newtype GeneralPolygon v r = Poly [v r]
perimeter :: (Num r, Floating r, Vector v) -> GeneralPolygon v r -> r
perimeter (Poly []) = 0
perimeter (Poly (x : xs)) = foldr (+) 0 (map norm (zipWith (.-.) (x : xs) (xs ++ [x])))
translate :: (Vector v, Num r) => GeneralPolygon v r -> v r -> GeneralPolygon v r
translate (Poly xs) v = Poly (map (v .+.) xs)
Making Typeclasses Work For You
Now if you really want to, you can also unwrap your sum-type data declaration into a bunch of data declarations:
data Line = Line Point2d Point2d deriving (Eq, Show)
data Square = Square Point2d Point2d deriving (Eq, Show)
data Triangle = Triangle Point2d Point2d Point2d deriving (Eq, Show)
Now you can do something simple like:
class Shape s where
perim :: s -> Int
move :: s -> Vector2d -> s
Although I should say, you'll run into a problem when you want to do square roots for perimeters (sqrt is in the Floating typeclass, which Int does not have functions for, you'll want to change Int to Double or something).

Resources