How can I do this in haskell?
equal(S,S) -> true;
equal(S1, S2) -> {differ, S1, S2}.
Haskell has a perfectly serviceable (==) operator for checking equality (on types for which equality is defined) so I'm assuming you're referring to something else here besides merely testing equality.
I don't know Erlang, but given that you wrote equal(S, S) my first guess would be that you want pattern matches to express equality by reusing the variable name. Unfortunately Haskell (and ML-style in general) pattern matching is less powerful than in languages like Prolog; all the pattern can do is bind variables, not perform full unification.
It's true that there are constant value patterns like foo [1,2] = ... but that's just syntactic sugar for a binding and equality check, and it's only done for constant values, not variables.
The usual Haskell approach would probably be pattern guards, like this:
data EqualResult a b = Yep | Nope (a, b) deriving (Show, Eq)
equal :: (Eq a) => a -> a -> EqualResult a a
equal s1 s2 | s1 == s2 = Yep
| otherwise = Nope (s1, s2)
On the off chance that you wanted some sort of reference equality instead of checking for equal values, that doesn't work because it doesn't even make sense in Haskell.
Edit: It has been pointed out to me that you may also have been asking about returning different result types. Working with types should be covered well in any introduction to Haskell, but the short version in this case is that if you need to return one of two possible types, you need a data type with one constructor for each; you then examine the result using pattern matching (in a declaration or case expression).
In this case, to make it look more like your function I've made a special-purpose type with two constructors: One indicating equality (with no further details) and one indicating inequality that holds a pair of values. You can also do it in a generic way using the built-in type Either a b, which has two constructors Left a and Right b.
Related
Why are these not equivalent?
show $ if someCondition then someInt else some double
and
if someCondition then show someInt else show someDouble
I understand that if you isolate the if ... else part in the first example to an expression by itself then you can't represent its type with an anonymous sum type, the kind of Int | Double, like something you could do easily in TypeScript (mentioning TypeScript because it is the langauge I used often and that supports Sum types), and would have to resort to using the Either data then based on it would call show.
The example I gave here is trivial but to me it makes more sense to think "Okay we are going to show something, and that something depends on someCondition" rather than "Okay if someCondition is true then show someInt otherwise show someDouble", and also allows for less code duplication (here the show is repeated twice but it could also be a long function application and instead of an if ... else there could be >2 branches to consider)
In my mind it should be easy for the compiler to check if each of the types that make the sum type (here Int | Double) could be used as a parameter to show function and decides if the types are correct or not. Even better is that show function always returns a string no matter the types of the parameters, so the compiler doesn't have to carry with it all the possible "branches" (so all the possible types).
Is it by choice that such a feature doesn't exist? Or is implementing it harder that I think?
All parts of an expression must be well-typed. The type of if someCondition then someInt else someDouble would have to be something like exists a. Show a => a, but Haskell doesn't support that kind of existential quantification.
Update: As chi points out in a comment, this would also be possible if Haskell had support for union/intersection types (which are not the same as sum/product types), but it unfortunately doesn't.
There are product types with lightweight syntax, written (,), in Haskell. One would thing that a sum type with a lightweight syntax, something like (Int | String), would be a great idea. The reality is more complicated. Let's see why (I'm taking some liberties with Num, they are not important).
if someCondition then 42 else "helloWorld"
If this should return a value of type like (Int | String), then what should the following return?
if someCondition then 42 else 0
(Int | Int) obviously, but if this is distinct from plain Int then we're in deep trouble. So (Int | Int) should be identical to plain Int.
One can immediately see that this is not just lightweight syntax for sum types, but a wholly new language feature. A different kind of type system if you will. Should we have one?
Let's look at this function.
mysteryType x a b = if x then a else b
Now what type does mysteryType have? Obviously
mysteryType :: Bool -> a -> b -> (a|b)
right? Now what if a and b are the same type?
let x = mysteryType True 42 0
This should be plain Int as we have agreed previously. Now mysteryType sometimes return an anonymous sum type, and sometimes it does not, depending on what arguments you pass. How would you pattern match such an expression? What on Earth can you do with it? Except trivial things like "show" (or whatever methods of other type-classes it would be an instance of), not a whole lot. Unless you add run-time type information to the language, that is, so typeof is available — and that make Haskell an entirely different language.
So yeah. Why isn't Haskell a TypeScript? Because we don't need another TypeScript. If you want TypeScript, you know where to find it.
This question already has answers here:
What's so special about 'return' keyword
(3 answers)
Closed 5 years ago.
Consider these functions
f1 :: Maybe Int
f1 = return 1
f2 :: [Int]
f2 = return 1
Both have the same statement return 1. But the results are different. f1 gives value Just 1 and f2 gives value [1]
Looks like Haskell invokes two different versions of return based on return type. I like to know more about this kind of function invocation. Is there a name for this feature in programming languages?
This is a long meandering answer!
As you've probably seen from the comments and Thomas's excellent (but very technical) answer You've asked a very hard question. Well done!
Rather than try to explain the technical answer I've tried to give you a broad overview of what Haskell does behind the scenes without diving into technical detail. Hopefully it will help you to get a big picture view of what's going on.
return is an example of type inference.
Most modern languages have some notion of polymorphism. For example var x = 1 + 1 will set x equal to 2. In a statically typed language 2 will usually be an int. If you say var y = 1.0 + 1.0 then y will be a float. The operator + (which is just a function with a special syntax)
Most imperative languages, especially object oriented languages, can only do type inference one way. Every variable has a fixed type. When you call a function it looks at the types of the argument and chooses a version of that function that fits the types (or complains if it can't find one).
When you assign the result of a function to a variable the variable already has a type and if it doesn't agree with the type of the return value you get an error.
So in an imperative language the "flow" of type deduction follows time in your program Deduce the type of a variable, do something with it and deduce the type of the result. In a dynamically typed language (such as Python or javascript) the type of a variable is not assigned until the value of the variable is computed (which is why there don't seem to be types). In a statically typed language the types are worked out ahead of time (by the compiler) but the logic is the same. The compiler works out what the types of variables are going to be, but it does so by following the logic of the program in the same way as the program runs.
In Haskell the type inference also follows the logic of the program. Being Haskell it does so in a very mathematically pure way (called System F). The language of types (that is the rules by which types are deduced) are similar to Haskell itself.
Now remember Haskell is a lazy language. It doesn't work out the value of anything until it needs it. That's why it makes sense in Haskell to have infinite data structures. It never occurs to Haskell that a data structure is infinite because it doesn't bother to work it out until it needs to.
Now all that lazy magic happens at the type level too. In the same way that Haskell doesn't work out what the value of an expression is until it really needs to, Haskell doesn't work out what the type of an expression is until it really needs to.
Consider this function
func (x : y : rest) = (x,y) : func rest
func _ = []
If you ask Haskell for the type of this function it has a look at the definition, sees [] and : and deduces that it's working with lists. But it never needs to look at the types of x and y, it just knows that they have to be the same because they end up in the same list. So it deduces the type of the function as [a] -> [a] where a is a type that it hasn't bothered to work out yet.
So far no magic. But it's useful to notice the difference between this idea and how it would be done in an OO language. Haskell doesn't convert the arguments to Object, do it's thing and then convert back. Haskell just hasn't been asked explicitly what the type of the list is. So it doesn't care.
Now try typing the following into ghci
maxBound - length ""
maxBound : "Hello"
Now what just happened !? minBound bust be a Char because I put it on the front of a string and it must be an integer because I added it to 0 and got a number. Plus the two values are clearly very different.
So what is the type of minBound? Let's ask ghci!
:type minBound
minBound :: Bounded a => a
AAargh! what does that mean? Basically it means that it hasn't bothered to work out exactly what a is, but is has to be Bounded if you type :info Bounded you get three useful lines
class Bounded a where
minBound :: a
maxBound :: a
and a lot of less useful lines
So if a is Bounded there are values minBound and maxBound of type a.
In fact under the hood Bounded is just a value, it's "type" is a record with fields minBound and maxBound. Because it's a value Haskell doesn't look at it until it really needs to.
So I appear to have meandered somewhere in the region of the answer to your question. Before we move onto return (which you may have noticed from the comments is a wonderfully complex beast.) let's look at read.
ghci again
read "42" + 7
read "'H'" : "ello"
length (read "[1,2,3]")
and hopefully you won't be too surprised to find that there are definitions
read :: Read a => String -> a
class Read where
read :: String -> a
so Read a is just a record containing a single value which is a function String -> a. Its very tempting to assume that there is one read function which looks at a string, works out what type is contained in the string and returns that type. But it does the opposite. It completely ignores the string until it's needed. When the value is needed, Haskell first works out what type it's expecting, once it's done that it goes and gets the appropriate version of the read function and combines it with the string.
now consider something slightly more complex
readList :: Read a => [String] -> a
readList strs = map read strs
under the hood readList actually takes two arguments
readList' (Read a) -> [String] -> [a]
readList' {read = f} strs = map f strs
Again as Haskell is lazy it only bothers looking at the arguments when it's needs to find out the return value, at that point it knows what a is, so the compiler can go and fine the right version of Read. Until then it doesn't care.
Hopefully that's given you a bit of an idea of what's happening and why Haskell can "overload" on the return type. But it's important to remember it's not overloading in the conventional sense. Every function has only one definition. It's just that one of the arguments is a bag of functions. read_str doesn't ever know what types it's dealing with. It just knows it gets a function String -> a and some Strings, to do the application it just passes the arguments to map. map in turn doesn't even know it gets strings. When you get deeper into Haskell it becomes very important that functions don't know very much about the types they're dealing with.
Now let's look at return.
Remember how I said that the type system in Haskell was very similar to Haskell itself. Remember that in Haskell functions are just ordinary values.
Does this mean I can have a type that takes a type as an argument and returns another type? Of course it does!
You've seen some type functions Maybe takes a type a and returns another type which can either be Just a or Nothing. [] takes a type a and returns a list of as. Type functions in Haskell are usually containers. For example I could define a type function BinaryTree which stores a load of a's in a tree like structure. There are of course lots of much stranger ones.
So, if these type functions are similar to ordinary types I can have a typeclass that contains type functions. One such typeclass is Monad
class Monad m where
return a -> m a
(>>=) m a (a -> m b) -> m b
so here m is some type function. If I want to define Monad for m I need to define return and the scary looking operator below it (which is called bind)
As others have pointed out the return is a really misleading name for a fairly boring function. The team that designed Haskell have since realised their mistake and they're genuinely sorry about it. return is just an ordinary function that takes an argument and returns a Monad with that type in it. (You never asked what a Monad actually is so I'm not going to tell you)
Let's define Monad for m = Maybe!
First I need to define return. What should return x be? Remember I'm only allowed to define the function once, so I can't look at x because I don't know what type it is. I could always return Nothing, but that seems a waste of a perfectly good function. Let's define return x = Just x because that's literally the only other thing I can do.
What about the scary bind thing? what can we say about x >>= f? well x is a Maybe a of some unknown type a and f is a function that takes an a and returns a Maybe b. Somehow I need to combine these to get a Maybe b`
So I need to define Nothing >== f. I can't call f because it needs an argument of type a and I don't have a value of type a I don't even know what 'a' is. I've only got one choice which is to define
Nothing >== f = Nothing
What about Just x >>= f? Well I know x is of type a and f takes a as an argument, so I can set y = f a and deduce that y is of type b. Now I need to make a Maybe b and I've got a b so ...
Just x >>= f = Just (f x)
So I've got a Monad! what if m is List? well I can follow a similar sort of logic and define
return x = [x]
[] >>= f = []
(x : xs) >>= a = f x ++ (xs >>= f)
Hooray another Monad! It's a nice exercise to go through the steps and convince yourself that there's no other sensible way of defining this.
So what happens when I call return 1?
Nothing!
Haskell's Lazy remember. The thunk return 1 (technical term) just sits there until someone needs the value. As soon as Haskell needs the value it know what type the value should be. In particular it can deduce that m is List. Now that it knows that Haskell can find the instance of Monad for List. As soon as it does that it has access to the correct version of return.
So finally Haskell is ready To call return, which in this case returns [1]!
The return function is from the Monad class:
class Applicative m => Monad (m :: * -> *) where
...
return :: a -> m a
So return takes any value of type a and results in a value of type m a. The monad, m, as you've observed is polymorphic using the Haskell type class Monad for ad hoc polymorphism.
At this point you probably realize return is not an good, intuitive, name. It's not even a built in function or a statement like in many other languages. In fact a better-named and identically-operating function exists - pure. In almost all cases return = pure.
That is, the function return is the same as the function pure (from the Applicative class) - I often think to myself "this monadic value is purely the underlying a" and I try to use pure instead of return if there isn't already a convention in the codebase.
You can use return (or pure) for any type that is a class of Monad. This includes the Maybe monad to get a value of type Maybe a:
instance Monad Maybe where
...
return = pure -- which is from Applicative
...
instance Applicative Maybe where
pure = Just
Or for the list monad to get a value of [a]:
instance Applicative [] where
{-# INLINE pure #-}
pure x = [x]
Or, as a more complex example, Aeson's parse monad to get a value of type Parser a:
instance Applicative Parser where
pure a = Parser $ \_path _kf ks -> ks a
Suppose I want to define the type Mod4 of integers modulo 4. After all, Int is Mod2^64. One obvious way I could go is
data Mod4 = ZeroMod4 | OneMod4 | TwoMod4 | ThreeMod4
However I could also do this
data Mod4 = Mod4 Integer
instance Eq Mod4 where
(Mod4 x) == (Mod4 y) = (x-y) `mod` 4 == 0
But then this function is problematic :
f :: Mod4 -> Mod4
f (Mod4 x) = if x < 20 then Mod4 0 else Mod4 1
f (Mod4 16) is different from f (Mod4 20), whereas those two arguments are ==. So I end up with two sorts of equality : representation in memory (
Mod4 16 is different from Mod4 20) and ==.
Since all functions can pattern match their arguments, they can always bypass any == operator. Why didn't Haskell just took the representation in memory as the definition of equality ? This way all types become trivially equatable.
Actually, equality is implied by the very concept of function : a graph that produces equal outputs when given equal inputs. So it makes little sense to speak of a function on a type that is not equatable.
Why didn't Haskell just took the representation in memory as the definition of equality ? This way all types become trivially equatable.
Nope. You can't compare values of type Integer -> Bool. Functions can not be compared, in general.
Back to the blackboard. How to design equality in a typed language?
One option is to let (==) :: a -> a -> Bool, and throw an exception if a is a function. See e.g. Ocaml.
Another option is to partition types in equatable/not equatable. This is eqtype in SML.
Another, but related, option is to express "eq-ability" as a constraint on the polymorphism. Eq in Haskell.
Now, Eq might have been more special. E.g. you can't define its instances by yourself, and you must use deriving Eq, similarly to how Typeable works now.
The Haskell designers instead to allow users to define their own comparison function. The users might know some "smarter" way. E.g. to compare a 10-field record, start by comparing the usually-different fields, and compare usually-equal ones later, trying to improve efficiency.
Note that, if we don't export the data type constructor, we can make equality to be an equivalence and still be useful. E.g. Data.Set.Set equates different (balanced) trees when they represent the same set, yet the exported interface never breaks the equivalence, so equality looks like equality from outside.
So it makes little sense to speak of a function on a type that is not equatable.
True, when "not equatable" is interpreted in a mathematical sense. However. when it is interpreted as "the equality predicate is not computable", it makes a lot of sense. We can speak of a function working on values whose type has undecidable equality.
There is a Tuple as a Product of any number of types and there is an Either as a Sum of two types. What is the name for a Sum of any number of types, something like this
data Thing a b c d ... = Thing1 a | Thing2 b | Thing3 c | Thing4 d | ...
Is there any standard implementation?
Before I make the suggestion against using such types, let me explain some background.
Either is a sum type, and a pair or 2-tuple is a product type. Sums and products can exist over arbitrarily many underlying types (sets). However, in Haskell, only tuples come in a variety of sizes out of the box. Either on the other hand, can to be (arbitrarily) nested to achieve that: Either Foo (Either Bar Baz).
Of course it's easy to instead define e.g. the types Either3 and Either4 etc, in the spirit of 3-tuples, 4-tuples and so on.
data Either3 a b c = Left a | Middle b | Right c
data Either4 a b c d = LeftMost a | Left b | Right c | RightMost d
...if you really want. Or you can find a library the does this, but I doubt you could call it "standard" by any standards...
However, if you do define your own generic sum and product types, they will be completely isomorphic to any type that is structurally equivalent, regardless of where it is defined. This means that you can, with relative ease, nicely adapt your code to interface with any other code that uses an alternative definition.
Furthermore, it is even very likely to be beneficial because that way you can give more meaningful, descriptive names to your sum and product types, instead of going with the generic tuple and either. In fact, some people advise for using custom types because it essentially adds static type safety. This also applies to non-sum/product types, e.g.:
employment :: Bool -- so which one is unemplyed and which one is employed?
data Empl = Employed | Unemployed
employment' :: Empl -- no ambiguity
or
person :: (Name, Age) -- yeah but when you see ("Erik", 29), is it just some random pair of name and age, or does it represent a person?
data Person = Person { name :: Name, age :: Age }
person' :: Person -- no ambiguity
— above, Person really encodes a product type, but with more meaning attached to it. You can also do newtype Person = Person (Name, Age), and it's actually quite equivalent anyway. So I always just prefer a nice and intention-revealing custom type. The same goes about Either and custom sum types.
So basically, Haskell gives you all the tools necessary to quickly build your own custom types with very clean and readable syntax, so it's best if we use it not resort to primitive types like tuples and either. However, it's nice to know about this isomorphism, for example in the context of generic programming. If you want to know more about that, you can google up "scrap your boilerplate" and "template your boilerplate" and just "(datatype) generic programming".
P.S. The reason they are called sum and product types respectively is that they correspond to set-union (sum) and set-product. Therefore, the number of values (or unique instances if you will) in the set that is described by the product type (a, b) is the product of the number of values in a and the number of values in b. For example (Bool, Bool) has exactly 2*2 values: (True, True), (False, False), (True, False), (False, True).
However Either Bool Bool has 2+2 values, Left True, Left False, Right True, Right False. So it happens to be the same number but that's obviously not the case in general.
But of course this can also be said about our custom Person product type, so again, there is little reason to use Either and tuples.
There are some predefined versions in HaXml package with OneOfN, TwoOfN, .. constructors.
In a generic context, this is usually done inductively, using Either or
data (:+:) f g a = L1 (f a) | R1 (g a)
The latter is defined in GHC.Generics to match the funny way it handles things.
In fact, the generic approach is to break every algebraic datatype down into (:+:) and
data (:*:) f g a = f a :*: f a
along with some extra stuff. That is, it turns everything into binary sums and binary products.
In a more concrete context, you're almost always better off using a custom algebraic datatype for things bigger than pairs or with more options than Either, as others have discussed. Slightly larger tuples (triples and maybe 4-tuples) can be useful for local one-off constructs, but it's hard to see how you'd use larger general sum types as one-offs.
Such a type is usually called a sum, variant, union, or tagged union type. Because the capability is a built-in feature of data types in Haskell, there's no name for it widely used in Haskell code. The Report only calls them "algebraic datatypes" (usually abbreviated to ADT), so that's the name you'll see most often in comments, but this name includes types with only one data constructor, which are only sum types in the trivial sense.
In order to prove that for instance the Category laws hold for some operations on a data type, how do one decide how to define equality? Considering the following type for representing boolean expressions:
data Exp
= ETrue
| EFalse
| EAnd Exp Exp
deriving (Eq)
Is it feasible trying to prove that Exp forms a Category with identity ETrue and operator:
(<&>) = EAnd
without redefining the Eq instance? Using the default instance of Eq the left-identity law breaks, i.e:
ETrue <&> e == e
evaluates to False. However, defining an eval function:
eval ETrue = True
eval EFalse = False
eval (EAnd e1 e2) = eval e1 && eval e2
and the Eq instance as:
instance Eq Exp where
e1 == e2 = eval e1 == eval e2
fixes the problem. Is comparison in terms of (==) a general requirement for claiming to satisfy such laws, or is it sufficient to say that the laws hold for a particular type of equality operator?
Equality is EVIL. You rarely (if ever) need structural equality,
because it is too strong. You only want an equivalence that is strong enough for
what you're doing. This is particularly true for category theory.
In Haskell, deriving Eq will give you structural equality, which means that you'll
often want to write your own implementation of == / /=.
A simple example: Define rational number as pairs of integers,
data Rat = Integer :/ Integer. If you use structural equality (what Haskell is
deriving), you'll have (1:/2) /= (2:/4), but as a fraction 1/2 == 2/4. What
you really care about is the value that your tuples denote, not their
representation. This means you'll need an equivalence that compares reduced
fractions, so you should implement that instead.
Side note: If someone using the code assumes that you've defined a structural
equality test, i.e. that checking with == justifies replacing data sub-components
through pattern matching, their code may break. If that is of importance,
you may hide the constructors to disallow pattern matching, or maybe define your
own class (say, Equiv with === and =/=) to separate both concepts. (This
is mostly important for theorem provers like Agda or Coq, in Haskell it's really
hard to get practical/real-world code so wrong that finally something breaks.)
Really Stupid(TM) example: Let's say that person wants to print long lists of huge
Rats and believes memoizing the string representations of the Integers will save
on binary-to-decimal conversion. There's a lookup table for Rats, such that equal
Rats will never be converted twice, and there's a lookup table for integers. If
(a:/b) == (c:/d), missing integer entries will be filled by copying between a-c /
b-d to skip conversion (ouch!). For the list [ 1:/1, 2:/2, 2:/4 ], 1 gets
converted and then, because 1:/1 == 2:/2, the string for 1 gets copied into the
2 lookup entry. The final result "1/1, 1/1, 1/4" is borked.