The limit set of types with new data like `Tree a` - haskell

Exploring and studing type system in Haskell I've found some problems.
1) Let's consider polymorphic type as Binary Tree:
data Tree a = Leaf a | Branch (Tree a) (Tree a) deriving Show
And, for example, I want to limit my considerations only with Tree Int, Tree Bool and Tree Char. Of course, I can make a such new type:
data TreeIWant = T1 (Tree Int) | T2 (Tree Bool) | T3 (Tree Char) deriving Show
But could it possible to make new restricted type (for homogeneous trees) in more elegant (and without new tags like T1,T2,T3) way (perhaps with some advanced type extensions)?
2) Second question is about trees with heterogeneous values. I can do them with usual Haskell, i.e. I can do the new helping type, contained tagged heterogeneous values:
data HeteroValues = H1 Int | H2 Bool | H3 Char deriving Show
and then make tree with values of this type:
type TreeH = Tree HeteroValues
But could it possible to make new type (for heterogeneous trees) in more elegant (and without new tags like H1,H2,H3) way (perhaps with some advanced type extensions)?
I know about heterogeneous list, perhaps it is the same question?

For question #2, it's easy to construct a "restricted" heterogeneous type without explicit tags using a GADT and a type class:
{-# LANGUAGE GADTs #-}
data Thing where
T :: THING a => a -> Thing
class THING a
Now, declare THING instances for the the things you want to allow:
instance THING Int
instance THING Bool
instance THING Char
and you can create Things and lists (or trees) of Things:
> t1 = T 'a' -- Char is okay
> t2 = T "hello" -- but String is not
... type error ...
> tl = [T (42 :: Int), T True, T 'x']
> tt = Branch (Leaf (T 'x')) (Leaf (T False))
>
In terms of the type names in your question, you have:
type HeteroValues = Thing
type TreeH = Tree Thing
You can use the same type class with a new GADT for question #1:
data ThingTree where
TT :: THING a => Tree a -> ThingTree
and you have:
type TreeIWant = ThingTree
and you can do:
> tt1 = TT $ Branch (Leaf 'x') (Leaf 'y')
> tt2 = TT $ Branch (Leaf 'x') (Leaf False)
... type error ...
>
That's all well and good, until you try to use any of the values you've constructed. For example, if you wanted to write a function to extract a Bool from a possibly boolish Thing:
maybeBool :: Thing -> Maybe Bool
maybeBool (T x) = ...
you'd find yourself stuck here. Without a "tag" of some kind, there's no way of determining if x is a Bool, Int, or Char.
Actually, though, you do have an implicit tag available, namely the THING type class dictionary for x. So, you can write:
maybeBool :: Thing -> Maybe Bool
maybeBool (T x) = maybeBool' x
and then implement maybeBool' in your type class:
class THING a where
maybeBool' :: a -> Maybe Bool
instance THING Int where
maybeBool' _ = Nothing
instance THING Bool where
maybeBool' = Just
instance THING Char where
maybeBool' _ = Nothing
and you're golden!
Of course, if you'd used explicit tags:
data Thing = T_Int Int | T_Bool Bool | T_Char Char
then you could skip the type class and write:
maybeBool :: Thing -> Maybe Bool
maybeBool (T_Bool x) = Just x
maybeBool _ = Nothing
In the end, it turns out that the best Haskell representation of an algebraic sum of three types is just an algebraic sum of three types:
data Thing = T_Int Int | T_Bool Bool | T_Char Char
Trying to avoid the need for explicit tags will probably lead to a lot of inelegant boilerplate elsewhere.
Update: As #DanielWagner pointed out in a comment, you can use Data.Typeable in place of this boilerplate (effectively, have GHC generate a lot of boilerplate for you), so you can write:
import Data.Typeable
data Thing where
T :: THING a => a -> Thing
class Typeable a => THING a
instance THING Int
instance THING Bool
instance THING Char
maybeBool :: Thing -> Maybe Bool
maybeBool = cast
This perhaps seems "elegant" at first, but if you try this approach in real code, I think you'll regret losing the ability to pattern match on Thing constructors at usage sites (and so having to substitute chains of casts and/or comparisons of TypeReps).

Related

Deriving Eq and Show for an ADT that contains fields that can't have Eq or Show

I'd like to be able to derive Eq and Show for an ADT that contains multiple fields. One of them is a function field. When doing Show, I'd like it to display something bogus, like e.g. "<function>"; when doing Eq, I'd like it to ignore that field. How can I best do this without hand-writing a full instance for Show and Eq?
I don't want to wrap the function field inside a newtype and write my own Eq and Show for that - it would be too bothersome to use like that.
One way you can get proper Eq and Show instances is to, instead of hard-coding that function field, make it a type parameter and provide a function that just “erases” that field. I.e., if you have
data Foo = Foo
{ fooI :: Int
, fooF :: Int -> Int }
you change it to
data Foo' f = Foo
{ _fooI :: Int
, _fooF :: f }
deriving (Eq, Show)
type Foo = Foo' (Int -> Int)
eraseFn :: Foo -> Foo' ()
eraseFn foo = foo{ fooF = () }
Then, Foo will still not be Eq- or Showable (which after all it shouldn't be), but to make a Foo value showable you can just wrap it in eraseFn.
Typically what I do in this circumstance is exactly what you say you don’t want to do, namely, wrap the function in a newtype and provide a Show for that:
data T1
{ f :: X -> Y
, xs :: [String]
, ys :: [Bool]
}
data T2
{ f :: OpaqueFunction X Y
, xs :: [String]
, ys :: [Bool]
}
deriving (Show)
newtype OpaqueFunction a b = OpaqueFunction (a -> b)
instance Show (OpaqueFunction a b) where
show = const "<function>"
If you don’t want to do that, you can instead make the function a type parameter, and substitute it out when Showing the type:
data T3' a
{ f :: a
, xs :: [String]
, ys :: [Bool]
}
deriving (Functor, Show)
newtype T3 = T3 (T3' (X -> Y))
data Opaque = Opaque
instance Show Opaque where
show = const "..."
instance Show T3 where
show (T3 t) = show (Opaque <$ t)
Or I’ll refactor my data type to derive Show only for the parts I want to be Showable by default, and override the other parts:
data T4 = T4
{ f :: X -> Y
, xys :: T4' -- Move the other fields into another type.
}
instance Show T4 where
show (T4 f xys) = "T4 <function> " <> show xys
data T4' = T4'
{ xs :: [String]
, ys :: [Bool]
}
deriving (Show) -- Derive ‘Show’ for the showable fields.
Or if my type is small, I’ll use a newtype instead of data, and derive Show via something like OpaqueFunction:
{-# LANGUAGE DerivingVia #-}
newtype T5 = T5 (X -> Y, [String], [Bool])
deriving (Show) via (OpaqueFunction X Y, [String], [Bool])
You can use the iso-deriving package to do this for data types using lenses if you care about keeping the field names / record accessors.
As for Eq (or Ord), it’s not a good idea to have an instance that equates values that can be observably distinguished in some way, since some code will treat them as identical and other code will not, and now you’re forced to care about stability: in some circumstance where I have a == b, should I pick a or b? This is why substitutability is a law for Eq: forall x y f. (x == y) ==> (f x == f y) if f is a “public” function that upholds the invariants of the type of x and y (although floating-point also violates this). A better choice is something like T4 above, having equality only for the parts of a type that can satisfy the laws, or explicitly using comparison modulo some function at use sites, e.g., comparing someField.
The module Text.Show.Functions in base provides a show instance for functions that displays <function>. To use it, just:
import Text.Show.Functions
It just defines an instance something like:
instance Show (a -> b) where
show _ = "<function>"
Similarly, you can define your own Eq instance:
import Text.Show.Functions
instance Eq (a -> b) where
-- all functions are equal...
-- ...though some are more equal than others
_ == _ = True
data Foo = Foo Int Double (Int -> Int) deriving (Show, Eq)
main = do
print $ Foo 1 2.0 (+1)
print $ Foo 1 2.0 (+1) == Foo 1 2.0 (+2) -- is True
This will be an orphan instance, so you'll get a warning with -Wall.
Obviously, these instances will apply to all functions. You can write instances for a more specialized function type (e.g., only for Int -> String, if that's the type of the function field in your data type), but there is no way to simultaneously (1) use the built-in Eq and Show deriving mechanisms to derive instances for your datatype, (2) not introduce a newtype wrapper for the function field (or some other type polymorphism as mentioned in the other answers), and (3) only have the function instances apply to the function field of your data type and not other function values of the same type.
If you really want to limit applicability of the custom function instances without a newtype wrapper, you'd probably need to build your own generics-based solution, which wouldn't make much sense unless you wanted to do this for a lot of data types. If you go this route, then the Generics.Deriving.Show and Generics.Deriving.Eq modules in generic-deriving provide templates for these instances which could be modified to treat functions specially, allowing you to derive per-datatype instances using some stub instances something like:
instance Show Foo where showsPrec = myGenericShowsPrec
instance Eq Foo where (==) = myGenericEquality
I proposed an idea for adding annotations to fields via fields, that allows operating on behaviour of individual fields.
data A = A
{ a :: Int
, b :: Int
, c :: Int -> Int via Ignore (Int->Int)
}
deriving
stock GHC.Generic
deriving (Eq, Show)
via Generically A -- assuming Eq (Generically A)
-- Show (Generically A)
But this is already possible with the "microsurgery" library, but you might have to write some boilerplate to get it going. Another solution is to write separate behaviour in "sums-of-products style"
data A = A Int Int (Int->Int)
deriving
stock GHC.Generic
deriving
anyclass SOP.Generic
deriving (Eq, Show)
via A <-𝈖-> '[ '[ Int, Int, Ignore (Int->Int) ] ]

differences: GADT, data family, data family that is a GADT

What/why are the differences between those three? Is a GADT (and regular data types) just a shorthand for a data family? Specifically what's the difference between:
data GADT a where
MkGADT :: Int -> GADT Int
data family FGADT a
data instance FGADT a where -- note not FGADT Int
MkFGADT :: Int -> FGADT Int
data family DF a
data instance DF Int where -- using GADT syntax, but not a GADT
MkDF :: Int -> DF Int
(Are those examples over-simplified, so I'm not seeing the subtleties of the differences?)
Data families are extensible, but GADTs are not. OTOH data family instances must not overlap. So I couldn't declare another instance/any other constructors for FGADT; just like I can't declare any other constructors for GADT. I can declare other instances for DF.
With pattern matching on those constructors, the rhs of the equation does 'know' that the payload is Int.
For class instances (I was surprised to find) I can write overlapping instances to consume GADTs:
instance C (GADT a) ...
instance {-# OVERLAPPING #-} C (GADT Int) ...
and similarly for (FGADT a), (FGADT Int). But not for (DF a): it must be for (DF Int) -- that makes sense; there's no data instance DF a, and if there were it would overlap.
ADDIT: to clarify #kabuhr's answer (thank you)
contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference
These types are tricky, so I expect I'd need explicit signatures to work with them. In that case the plain data family is easiest
inferDF (MkDF x) = x -- works without a signature
The inferred type inferDF :: DF Int -> Int makes sense. Giving it a signature inferDF :: DF a -> a doesn't make sense: there is no declaration for a data instance DF a .... Similarly with foodouble :: Foo Char a -> a there is no data instance Foo Char a ....
GADTs are awkward, I already know. So neither of these work without an explicit signature
inferGADT (MkGADT x) = x
inferFGADT (MkFGADT x) = x
Mysterious "untouchable" message, as you say. What I meant in my "matching on those constructors" comment was: the compiler 'knows' on rhs of an equation that the payload is Int (for all three constructors), so you'd better get any signatures consistent with that.
Then I'm thinking data GADT a where ... is as if data instance GADT a where .... I can give a signature inferGADT :: GADT a -> a or inferGADT :: GADT Int -> Int (likewise for inferFGADT). That makes sense: there is a data instance GADT a ... or I can give a signature at a more specific type.
So in some ways data families are generalisations of GADTs. I also see as you say
So, in some ways, GADTs are generalizations of data families.
Hmm. (The reason behind the question is that GHC Haskell has got to the stage of feature bloat: there's too many similar-but-different extensions. I was trying to prune it down to a smaller number of underlying abstractions. Then #HTNW's approach of explaining in terms of yet further extensions is opposite to what would help a learner. IMO existentials in data types should be chucked out: use GADTs instead. PatternSynonyms should be explained in terms of data types and mapping functions between them, not the other way round. Oh, and there's some DataKinds stuff, which I skipped over on first reading.)
As a start, you should think of a data family as a collection of independent ADTs that happen to be indexed by a type, while a GADT is a single data type with an inferrable type parameter where constraints on that parameter (typically, equality constraints like a ~ Int) can be brought into scope by pattern matching.
This means that the biggest difference is that, contrary to what I think you're claiming in part of your question, for a plain data family, matching on a constructor does not perform any inference on the type parameter. In particular, this typechecks:
inferGADT :: GADT a -> a
inferGADT (MkGADT n) = n
but this does not:
inferDF :: DF a -> a
inferDF (MkDF n) = n
and without type signatures, the first would fail to type check (with a mysterious "untouchable" message) while the second would be inferred as DF Int -> Int.
The situation becomes quite a bit more confusing for something like your FGADT type that combines data families with GADTs, and I confess I haven't really thought about how this works in detail. But, as an interesting example, consider:
data family Foo a b
data instance Foo Int a where
Bar :: Double -> Foo Int Double
Baz :: String -> Foo Int String
data instance Foo Char Double where
Quux :: Double -> Foo Char Double
data instance Foo Char String where
Zlorf :: String -> Foo Char String
In this case, Foo Int a is a GADT with an inferrable a parameter:
fooint :: Foo Int a -> a
fooint (Bar x) = x + 1.0
fooint (Baz x) = x ++ "ish"
but Foo Char a is just a collection of separate ADTs, so this won't typecheck:
foodouble :: Foo Char a -> a
foodouble (Quux x) = x
for the same reason inferDF won't typecheck above.
Now, getting back to your plain DF and GADT types, you can largely emulate DFs just using GADTs. For example, if you have a DF:
data family MyDF a
data instance MyDF Int where
IntLit :: Int -> MyDF Int
IntAdd :: MyDF Int -> MyDF Int -> MyDF Int
data instance MyDF Bool where
Positive :: MyDF Int -> MyDF Bool
you can write it as a GADT just by writing separate blocks of constructors:
data MyGADT a where
-- MyGADT Int
IntLit' :: Int -> MyGADT Int
IntAdd' :: MyGADT Int -> MyGADT Int -> MyGADT Int
-- MyGADT Bool
Positive' :: MyGADT Int -> MyGADT Bool
So, in some ways, GADTs are generalizations of data families. However, a major use case for data families is defining associated data types for classes:
class MyClass a where
data family MyRep a
instance MyClass Int where
data instance MyRep Int = ...
instance MyClass String where
data instance MyRep String = ...
where the "open" nature of data families is needed (and where the pattern-based inference methods of GADTs aren't helpful).
I think the difference becomes clear if we use PatternSynonyms-style type signatures for data constructors. Lets start with Haskell 98
data D a = D a a
You get a pattern type:
pattern D :: forall a. a -> a -> D a
it can be read in two directions. D, in "forward" or expression contexts, says, "forall a, you can give me 2 as and I'll give you a D a". "Backwards", as a pattern, it says, "forall a, you can give me a D a and I'll give you 2 as".
Now, the things you write in a GADT definition are not pattern types. What are they? Lies. Lies lies lies. Give them attention only insofar as the alternative is writing them out manually with ExistentialQuantification. Let's use this one
data GD a where
GD :: Int -> GD Int
You get
-- vv ignore
pattern GD :: forall a. () => (a ~ Int) => Int -> GD a
This says: forall a, you can give me a GD a, and I can give you a proof that a ~ Int, plus an Int.
Important observation: The return/match type of a GADT constructor is always the "data type head". I defined data GD a where ...; I got GD :: forall a. ... GD a. This is also true for Haskell 98 constructors, and also data family constructors, though it's a bit more subtle.
If I have a GD a, and I don't know what a is, I can pass into GD anyway, even though I wrote GD :: Int -> GD Int, which seems to say I can only match it with GD Ints. This is why I say GADT constructors lie. The pattern type never lies. It clearly states that, forall a, I can match a GD a with the GD constructor and get evidence for a ~ Int and a value of Int.
Ok, data familys. Lets not mix them with GADTs yet.
data Nat = Z | S Nat
data Vect (n :: Nat) (a :: Type) :: Type where
VNil :: Vect Z a
VCons :: a -> Vect n a -> Vect (S n) a -- try finding the pattern types for these btw
data family Rect (ns :: [Nat]) (a :: Type) :: Type
newtype instance Rect '[] a = RectNil a
newtype instance Rect (n : ns) a = RectCons (Vect n (Rect ns a))
There are actually two data type heads now. As #K.A.Buhr says, the different data instances act like different data types that just happen to share a name. The pattern types are
pattern RectNil :: forall a. a -> Rect '[] a
pattern RectCons :: forall n ns a. Vect n (Rect ns a) -> Rect (n : ns) a
If I have a Rect ns a, and I don't know what ns is, I cannot match on it. RectNil only takes Rect '[] as, RectCons only takes Rect (n : ns) as. You might ask: "why would I want a reduction in power?" #KABuhr has given one: GADTs are closed (and for good reason; stay tuned), families are open. This doesn't hold in Rect's case, as these instances already fill up the entire [Nat] * Type space. The reason is actually newtype.
Here's a GADT RectG:
data RectG :: [Nat] -> Type -> Type where
RectGN :: a -> RectG '[] a
RectGC :: Vect n (RectG ns a) -> RectG (n : ns) a
I get
-- it's fine if you don't get these
pattern RectGN :: forall ns a. () => (ns ~ '[]) => a -> RectG ns a
pattern RectGC :: forall ns' a. forall n ns. (ns' ~ (n : ns)) =>
Vect n (RectG ns a) -> RectG ns' a
-- just note that they both have the same matched type
-- which means there needs to be a runtime distinguishment
If I have a RectG ns a and don't know what ns is, I can still match on it just fine. The compiler has to preserve this information with a data constructor. So, if I had a RectG [1000, 1000] Int, I would incur an overhead of one million RectGN constructors that all "preserve" the same "information". Rect [1000, 1000] Int is fine, though, as I do not have the ability to match and tell whether a Rect is RectNil or RectCons. This allows the constructor to be newtype, as it holds no information. I would instead use a different GADT, somewhat like
data SingListNat :: [Nat] -> Type where
SLNN :: SingListNat '[]
SLNCZ :: SingListNat ns -> SingListNat (Z : ns)
SLNCS :: SingListNat (n : ns) -> SingListNat (S n : ns)
that stores the dimensions of a Rect in O(sum ns) space instead of O(product ns) space (I think those are right). This is also why GADTs are closed and families are open. A GADT is just like a normal data type except it has equality evidence and existentials. It doesn't make sense to add constructors to a GADT any more than it makes sense to add constructors to a Haskell 98 type, because any code that doesn't know about one of the constructors is in for a very bad time. It's fine for families though, because, as you noticed, once you define a branch of a family, you cannot add more constructors in that branch. Once you know what branch you're in, you know the constructors, and no one can break that. You're not allowed to use any constructors if you don't know which branch to use.
Your examples don't really mix GADTs and data families. Pattern types are nifty in that they normalize away superficial differences in data definitions, so let's take a look.
data family FGADT a
data instance FGADT a where
MkFGADT :: Int -> FGADT Int
Gives you
pattern MkFGADT :: forall a. () => (a ~ Int) => Int -> FGADT a
-- no different from a GADT; data family does nothing
But
data family DF a
data instance DF Int where
MkDF :: Int -> DF Int
gives
pattern MkDF :: Int -> DF Int
-- GADT syntax did nothing
Here's a proper mixing
data family Map k :: Type -> Type
data instance Map Word8 :: Type -> Type where
MW8BitSet :: BitSet8 -> Map Word8 Bool
MW8General :: GeneralMap Word8 a -> Map Word8 a
Which gives patterns
pattern MW8BitSet :: forall a. () => (a ~ Bool) => BitSet8 -> Map Word8 a
pattern MW8General :: forall a. GeneralMap Word8 a -> Map Word8 a
If I have a Map k v and I don't know what k is, I can't match it against MW8General or MW8BitSet, because those only want Map Word8s. This is the data family's influence. If I have a Map Word8 v and I don't know what v is, matching on the constructors can reveal to me whether it's known to be Bool or is something else.

Haskell type checking in code

Could you please show me how can I check if type of func is Tree or not, in code not in command page?
data Tree = Leaf Float | Gate [Char] Tree Tree deriving (Show, Eq, Ord)
func a = Leaf a
Well, there are a few answers, which zigzag in their answers to "is this possible".
You could ask ghci
ghci> :t func
func :: Float -> Tree
which tells you the type.
But you said in your comment that you are wanting to write
if func == Tree then 0 else 1
which is not possible. In particular, you can't write any function like
isTree :: a -> Bool
isTree x = if x :: Tree then True else False
because it would violate parametericity, which is a neat property that all polymorphic functions in Haskell have, which is explored in the paper Theorems for Free.
But you can write such a function with some simple generic mechanisms that have popped up; essentially, if you want to know the type of something at runtime, it needs to have a Typeable constraint (from the module Data.Typeable). Almost every type is Typeable -- we just use the constraint to indicate the violation of parametericity and to indicate to the compiler that it needs to pass runtime type information.
import Data.Typeable
import Data.Maybe (isJust)
data Tree = Leaf Float | ...
deriving (Typeable) -- we need Trees to be typeable for this to work
isTree :: (Typeable a) => a -> Bool
isTree x = isJust (cast x :: Maybe Tree)
But from my experience, you probably don't actually need to ask this question. In Haskell this question is a lot less necessary than in other languages. But I can't be sure unless I know what you are trying to accomplish by asking.
Here's how to determine what the type of a binding is in Haskell: take something like f a1 a2 a3 ... an = someExpression and turn it into f = \a1 -> \a2 -> \a3 -> ... \an -> someExpression. Then find the type of the expression on the right hand side.
To find the type of an expression, simply add a SomeType -> for each lambda, where SomeType is whatever the appropriate type of the bound variable is. Then use the known types in the remaining (lambda-less) expression to find its actual type.
For your example: func a = Leaf a turns into func = \a -> Leaf a. Now to find the type of \a -> Leaf a, we add a SomeType -> for the lambda, where SomeType is Float in this case. (because Leaf :: Float -> Tree, so if Leaf is applied to a, then a :: Float) This gives us Float -> ???
Now we find the type of the lambda-less expression Leaf (a :: Float), which is Tree because Leaf :: Float -> Tree. Now we can add substitute Tree for ??? to get Float -> Tree, the actual type of func.
As you can see, we did that all by just looking at the source code. This means that no matter what, func will always have that type, so there is no need to check whether or not it does. In fact, the compiler will throw out all information about the type of func when it compiles your code, and your code will still work properly because of type-checking. (The caveat to this (Typeable) is pointed out in the other answer)
TL;DR: Haskell is statically typed, so func always has the type Float -> Tree, so asking how to check whether that is true doesn't make sense.

Check if it is a specific type - haskell

I thought it would be really easy to find the answer online but I had no luck with that. Which means that my question should't be a question but I am sure more people new to Haskell might come up with the same question.
So how do I check if a value is of a certain type?
I have the following data type defined and I wanna check whether the input on a function is of a specific type.
data MyType a = MyInt Int | MyOther a (MyType a)
First, your data declaration will not work. Let's assume you're using this type:
data MyType a = MyInt Int | MyOther a (MyType a)
then you can have functions that take a MyType a, some specific MyType (e.g. MyType Int) or a constrained MyType (e.g. Num a => MyType a).
If you want to know whether you have a MyInt or a MyOther, you can simply use pattern matching:
whichAmI :: MyType a -> String
whichAmI (MyInt i) = "I'm an Int with value " ++ show i
whichAmI (MyOther _ _) = "I'm something else"
When you want to know if the type in the parameter a is a Num, or what type it is, you will run into a fundamental Haskell limitation. Haskell is statically typed so there is no such dynamic checking of what the a in MyType a is.
The solution is to limit your function if you need a certain type of a. For example we can have:
mySum :: Num a => MyType a -> a
mySum (MyInt i) = fromIntegral i
mySum (MyOther n m) = n + mySum m
or we can have a function that only works if a is a Bool:
trueOrGE10 :: MyType Bool -> Bool
trueOrGE10 (MyInt i) = i >= 10
trueOrGE10 (MyOther b _) = b
As with all Haskell code, it will need to be possible to determine at compile-time whether a particular expression you put into one of these functions has the right type.

Is there a compiler-extension for untagged union types in Haskell?

In some languages (#racket/typed, for example), the programmer can specify a union type without discriminating against it, for instance, the type (U Integer String) captures integers and strings, without tagging them (I Integer) (S String) in a data IntOrStringUnion = ... form or anything like that.
Is there a way to do the same in Haskell?
Either is what you're looking for... ish.
In Haskell terms, I'd describe what you're looking for as an anonymous sum type. By anonymous, I mean that it doesn't have a defined name (like something with a data declaration). By sum type, I mean a data type that can have one of several (distinguishable) types; a tagged union or such. (If you're not familiar with this terminology, try Wikipedia for starters.)
We have a well-known idiomatic anonymous product type, which is just a tuple. If you want to have both an Int and a String, you just smush them together with a comma: (Int, String). And tuples (seemingly) can go on forever--(Int, String, Double, Word), and you can pattern-match the same way. (There's a limit, but never mind.)
The well-known idiomatic anonymous sum type is Either, from Data.Either (and the Prelude):
data Either a b = Left a | Right b
deriving (Eq, Ord, Read, Show, Typeable)
It has some shortcomings, most prominently a Functor instance that favors Right in a way that's odd in this context. The problem is that chaining it introduces a lot of awkwardness: the type ends up like Either (Int (Either String (Either Double Word))). Pattern matching is even more awkward, as others have noted.
I just want to note that we can get closer to (what I understand to be) the Racket use case. From my extremely brief Googling, it looks like in Racket you can use functions like isNumber? to determine what type is actually in a given value of a union type. In Haskell, we usually do that with case analysis (pattern matching), but that's awkward with Either, and function using simple pattern-matching will likely end up hard-wired to a particular union type. We can do better.
IsNumber?
I'm going to write a function I think is an idiomatic Haskell stand-in for isNumber?. First, we don't like doing Boolean tests and then running functions that assume their result; instead, we tend to just convert to Maybe and go from there. So the function's type will end with -> Maybe Int. (Using Int as a stand-in for now.)
But what's on the left hand of the arrow? "Something that might be an Int -- or a String, or whatever other types we put in the union." Uh, okay. So it's going to be one of a number of types. That sounds like typeclass, so we'll put a constraint and a type variable on the left hand of the arrow: MightBeInt a => a -> Maybe Int. Okay, let's write out the class:
class MightBeInt a where
isInt :: a -> Maybe Int
fromInt :: Int -> a
Okay, now how do we write the instances? Well, we know if the first parameter to Either is Int, we're golden, so let's write that out. (Incidentally, if you want a nice exercise, only look at the instance ... where parts of these next three code blocks, and try to implement that class members yourself.)
instance MightBeInt (Either Int b) where
isInt (Left i) = Just i
isInt _ = Nothing
fromInt = Left
Fine. And ditto if Int is the second parameter:
instance MightBeInt (Either a Int) where
isInt (Right i) = Just i
isInt _ = Nothing
fromInt = Right
But what about Either String (Either Bool Int)? The trick is to recurse on the right hand type: if it's not Int, is it an instance of MightBeInt itself?
instance MightBeInt b => MightBeInt (Either a b) where
isInt (Right xs) = isInt xs
isInt _ = Nothing
fromInt = Right . fromInt
(Note that these all require FlexibleInstances and OverlappingInstances.) It took me a long time to get a feel for writing and reading these class instances; don't worry if this instance is surprising. The punch line is that we can now do this:
anInt1 :: Either Int String
anInt1 = fromInt 1
anInt2 :: Either String (Either Int Double)
anInt2 = fromInt 2
anInt3 :: Either String Int
anInt3 = fromInt 3
notAnInt :: Either String Int
notAnInt = Left "notint"
ghci> isInt anInt3
Just 3
ghci> isInt notAnInt
Nothing
Great!
Generalizing
Okay, but now do we need to write another type class for each type we want to look up? Nope! We can parameterize the class by the type we want to look up! It's a pretty mechanical translation; the only question is how to tell the compiler what type we're looking for, and that's where Proxy comes to the rescue. (If you don't want to install tagged or run base 4.7, just define data Proxy a = Proxy. It's nothing special, but you'll need PolyKinds.)
class MightBeA t a where
isA :: proxy t -> a -> Maybe t
fromA :: t -> a
instance MightBeA t t where
isA _ = Just
fromA = id
instance MightBeA t (Either t b) where
isA _ (Left i) = Just i
isA _ _ = Nothing
fromA = Left
instance MightBeA t b => MightBeA t (Either a b) where
isA p (Right xs) = isA p xs
isA _ _ = Nothing
fromA = Right . fromA
ghci> isA (Proxy :: Proxy Int) anInt3
Just 3
ghci> isA (Proxy :: Proxy String) notAnInt
Just "notint"
Now the usability situation is... better. The main thing we've lost, by the way, is the exhaustiveness checker.
Notational Parity With (U String Int Double)
For fun, in GHC 7.8 we can use DataKinds and TypeFamilies to eliminate the infix type constructors in favor of type-level lists. (In Haskell, you can't have one type constructor--like IO or Either--take a variable number of parameters, but a type-level list is just one parameter.) It's just a few lines, which I'm not really going to explain:
type family OneOf (as :: [*]) :: * where
OneOf '[] = Void
OneOf '[a] = a
OneOf (a ': as) = Either a (OneOf as)
Note that you'll need to import Data.Void. Now we can do this:
anInt4 :: OneOf '[Int, Double, Float, String]
anInt4 = fromInt 4
ghci> :kind! OneOf '[Int, Double, Float, String]
OneOf '[Int, Double, Float, String] :: *
= Either Int (Either Double (Either Float [Char]))
In other words, OneOf '[Int, Double, Float, String] is the same as Either Int (Either Double (Either Float [Char])).
You need some kind of tagging because you need to be able to check if a value is actually an Integer or a String to use it for anything. One way to alleviate having to create a custom ADT for every combination is to use a type such as
{-# LANGUAGE TypeOperators #-}
data a :+: b = L a | R b
infixr 6 :+:
returnsIntOrString :: Integer -> Integer :+: String
returnsIntOrString i
| i `rem` 2 == 0 = R "Even"
| otherwise = L (i * 2)
returnsOneOfThree :: Integer -> Integer :+: String :+: Bool
returnsOneOfThree i
| i `rem` 2 == 0 = (R . L) "Even"
| i `rem` 3 == 0 = (R . R) False
| otherwise = L (i * 2)

Resources