How to know the data type of a value of a sum type [duplicate] - haskell

What is pattern matching in Haskell and how is it related to guarded equations?
I've tried looking for a simple explanation, but I haven't found one.
EDIT:
Someone tagged as homework. I don't go to school anymore, I'm just learning Haskell and I'm trying to understand this concept. Pure out of interest.

In a nutshell, patterns are like defining piecewise functions in math. You can specify different function bodies for different arguments using patterns. When you call a function, the appropriate body is chosen by comparing the actual arguments with the various argument patterns. Read A Gentle Introduction to Haskell for more information.
Compare:
with the equivalent Haskell:
fib 0 = 1
fib 1 = 1
fib n | n >= 2
= fib (n-1) + fib (n-2)
Note the "n ≥ 2" in the piecewise function becomes a guard in the Haskell version, but the other two conditions are simply patterns. Patterns are conditions that test values and structure, such as x:xs, (x, y, z), or Just x. In a piecewise definition, conditions based on = or ∈ relations (basically, the conditions that say something "is" something else) become patterns. Guards allow for more general conditions. We could rewrite fib to use guards:
fib n | n == 0 = 1
| n == 1 = 1
| n >= 2 = fib (n-1) + fib (n-2)

There are other good answers, so I'm going to give you a very technical answer. Pattern matching is the elimination construct for algebraic data types:
"Elimination construct" means "how to consume or use a value"
"Algebraic data type", in addition to first-class functions, is the big idea in a statically typed functional language like Clean, F#, Haskell, or ML
The idea of algebraic data types is that you define a type of thing, and you say all the ways you can make that thing. As an example, let's define "Sequence of String" as an algebraic data type, with three ways to make it:
data StringSeq = Empty -- the empty sequence
| Cat StringSeq StringSeq -- two sequences in succession
| Single String -- a sequence holding a single element
Now, there are all sorts of things wrong with this definition, but as an example it's interesting because it provides constant-time concatenation of sequences of arbitrary length. (There are other ways to achieve this.) The declaration introduces Empty, Cat, and Single, which are all the ways there are of making sequences. (That makes each one an introduction construct—a way to make things.)
You can make an empty sequence without any other values.
To make a sequence with Cat, you need two other sequences.
To make a sequence with Single, you need an element (in this case a string)
Here comes the punch line: the elimination construct, pattern matching, gives you a way to scrutinize a sequence and ask it the question what constructor were you made with?. Because you have to be prepared for any answer, you provide at least one alternative for each constructor. Here's a length function:
slen :: StringSeq -> Int
slen s = case s of Empty -> 0
Cat s s' -> slen s + slen s'
Single _ -> 1
At the core of the language, all pattern matching is built on this case construct. However, because algebraic data types and pattern matching are so important to the idioms of the language, there's special "syntactic sugar" for doing pattern matching in the declaration form of a function definition:
slen Empty = 0
slen (Cat s s') = slen s + slen s'
slen (Single _) = 1
With this syntactic sugar, computation by pattern matching looks a lot like definition by equations. (The Haskell committee did this on purpose.) And as you can see in the other answers, it is possible to specialize either an equation or an alternative in a case expression by slapping a guard on it. I can't think of a plausible guard for the sequence example, and there are plenty of examples in the other answers, so I'll leave it there.

Pattern matching is, at least in Haskell, deeply tied to the concept of algebraic data types. When you declare a data type like this:
data SomeData = Foo Int Int
| Bar String
| Baz
...it defines Foo, Bar, and Baz as constructors--not to be confused with "constructors" in OOP--that construct a SomeData value out of other values.
Pattern matching is nothing more than doing this in reverse--a pattern would "deconstruct" a SomeData value into its constituent pieces (in fact, I believe that pattern matching is the only way to extract values in Haskell).
When there are multiple constructors for a type, you write multiple versions of a function for each pattern, with the correct one being selected depending on which constructor was used (assuming you've written patterns to match all possible constructions--which it's generally good practice to do).

In a functional language, pattern matching involves checking an argument against different forms. A simple example involves recursively defined operations on lists. I will use OCaml to explain pattern matching since it's my functional language of choice, but the concepts are the same in F# and Haskell, AFAIK.
Here is the definition of a function to compute the length of a list lst. In OCaml, an ``a listis defined recursively as the empty list[], or the structureh::t, wherehis an element of typea(abeing any type we want, such as an integer or even another list),tis a list (hence the recursive definition), and::` is the cons operator, which creates a new list out of an element and a list.
So the function would look like this:
let rec len lst =
match lst with
[] -> 0
| h :: t -> 1 + len t
rec is a modifier that tells OCaml that a function will call itself recursively. Don't worry about that part. The match statement is what we're focusing on. OCaml will check lst against the two patterns - empty list, or h :: t - and return a different value based on that. Since we know every list will match one of these patterns, we can rest assured that our function will return safely.
Note that even though these two patterns will take care of all lists, you aren't limited to them. A pattern like h1 :: h2 :: t (matching all lists of length 2 or more) is also valid.
Of course, the use of patterns isn't restricted to recursively defined data structures, or recursive functions. Here is a (contrived) function to tell you whether a number is 1 or 2:
let is_one_or_two num =
match num with
1 -> true
| 2 -> true
| _ -> false
In this case, the forms of our pattern are the numbers themselves. _ is a special catch-all used as a default case, in case none of the above patterns match.

Pattern matching is one of those painful operations that is hard to get one's head around if you come from procedural programming background. I find it hard to get into because the same syntax used to create a data structure can be used for matching.
In F# you can use the cons operator :: to add an element to the beginning of a list like so:
let a = 1 :: [2;3]
//val a : int list = [1; 2; 3]
Similarly you can use the same operator to split the list up like so:
let a = [1;2;3];;
match a with
| [a;b] -> printfn "List contains 2 elements" //will match a list with 2 elements
| a::tail -> printfn "Head is %d" a //will match a list with 2 or more elements
| [] -> printfn "List is empty" //will match an empty list

Related

Haskell - Function Evaluation

I am confused about when Haskell evaluates functions, compared to when it just returns the function itself. I was taught that pattern matching drives function evaluation, but then I don't understand why
f :: Int -> Int
f x = x+1
works. Does f add 1 to an integer, or does it return a function which adds 1 to an integer? Are these two the same thing? There is no pattern matching as far as I can tell, so I'm not sure why it gets evaluated.
Another question: suppose I want to make an 8x8 list that contains all 0's, except the first row contains the numbers 1,2,3,4,5,6,7,8 instead. Is there any way I could initialize it to all 0's first and then change the first row to [1..8]? I understand that it's not idiomatic to make sequential code like this, so is there a better way to do it, hopefully without using do blocks?
Finally, I am also confused about the let and where syntax. Suppose that in the middle of a function definition, I say temp = x + 1. How is this different from saying let temp = x + 1 or ...temp where temp = x + 1? In each of these cases, does temp have type Int or Int -> Int? Why do people use do with let so often?
This certainly was a collection of questions.
Firstly, about evaluation, Haskell is lazy. It will evaluate values as they are needed. This includes the fact that a function is not necessarily evaluated in its entirety. Pattern matching may drive evaluation in some cases, for instance in maybe either a Nothing or Just x must match, in order to find out what value is produced. But if you didn't demand the result in the first place, this matching was never needed.
f in your example is a function known as (+1), or more explicitly \x -> x + 1. Being a function, it must be applied to a value to produce another, and there is in fact a pattern; the argument x, having type Int. This works as a simple binding, but it could have been a constant value pattern like 1 instead. Here's an example:
fib :: Int -> Int
fib 0 = 1
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
The two first patterns give us our base cases.
An 8x8 grid of numbers is a matrix, not a list. Data.Array, Data.Matrix and Data.Vector provide types that can describe such things more accurately, and what you describe can be done. Data.Ix provides multidimensional indices and functions like Data.Vector.modify may perform updates in place without violating value immutability.
let bindings in expression and expression where bindings are mostly a matter of preference. let binding within a do block is a different matter. In your sample binding temp = x + 1, x must be bound from elsewhere, + is of type Num a => a -> a -> a, so both x and temp must be the same Num a. A function must take an argument, so this is just a value (though mathematically it's a function of x).
As for do with let, it's essentially a shorthand for adding another binding; you could write:
main = do
putStrLn "hello"
let word = "world"
putStrLn word
and it's equivalent to:
main = do
putStrLn "hello"
let word = "world" in do
putStrLn word
This provides a way to introduce a pure value mid-do, like <- introduces monadic ones.

The meaning of the universal quantification

I am trying to understand the meaning of the universal quantification from the following page http://dev.stephendiehl.com/hask/#universal-quantification.
I am not sure, if I understand this sentence correctly
The essence of universal quantification is that we can express
functions which operate the same way for a set of types and whose
function behavior is entirely determined only by the behavior of all
types in this span.
Let`s take the function from the example:
-- ∀a. [a]
example1 :: forall a. [a]
example1 = []
What I can do with the function example1 is, to use every functions, that is defined for List type.
But I did not get the exactly purpose of the universal quantification in Haskell.
I need a collection of numbers, and I need to be able to easily insert into the middle of the list, so I decide on making a linked list. Being a savvy Hask-- programmer (Hask-- being the variant of Haskell that does not have universal quantification!), I quickly whip up a type and a length function without trouble:
data IntLinkedList = IntNil | IntCons Int IntLinkedList
length_IntLinkedList :: IntLinkedList -> Int
length_IntLinkedList IntNil = 0
length_IntLinkedList (IntCons _ tail) = 1 + length_IntLinkedList tail
Later I realize it would be handy to have a variant type that can store numbers not quite as big as 1 and not quite as small as 0. No problem...
data FloatLinkedList = FloatNil | FloatCons Float FloatLinkedList
length_FloatLinkedList :: FloatLinkedList -> Int
length_FloatLinkedList FloatNil = 0
length_FloatLinkedList (FloatCons _ tail) = 1 + length_FloatLinkedList tail
Boy that code looks awfully familiar! And if, later, I discover it would be nice to have a variant that can store Strings I am once again left copying and pasting the exact same code, and tweaking the exact same places that are specific to the contained type. Wouldn't it be nice if there were a way to just cook up a linked list once and for all that could contain elements of any single type, and a length function that worked uniformly no matter what elements it had? After all, our length functions above didn't even care what values the elements had. In Haskell, this is exactly what universal quantification gives you: a way to write a single function which works with an entire collection of types.
Here's how it looks:
data LinkedList a = Nil | Cons a (LinkedList a)
length_LinkedList :: forall a. LinkedList a -> Int
length_LinkedList Nil = 0
length_LinkedList (Cons _ tail) = 1 + length_LinkedList tail
The forall says that this function for all variants of linked lists -- linked lists of Ints, linked lists of Floats, linked lists of Strings, linked lists of functions that take FibbledyGibbets and return linked lists of tuples of Grazbars and WonkyNobbers, ...
How nice! Now instead of separate IntLinkedList and FloatLinkedList types, we can just use LinkedList Int and LinkedList Float for that, and length_LinkedList, implemented once, works for both.

How to define multiple patterns in Frege?

I'm having some trouble defining a function in Frege that uses multiple patterns. Basically, I'm defining a mapping by iterating through a list of tuples. I've simplified it down to the following:
foo :: a -> [(a, b)] -> b
foo _ [] = [] --nothing found
foo bar (baz, zab):foobar
| bar == baz = zab
| otherwise = foo bar foobar
I get the following error:
E morse.fr:3: redefinition of `foo` introduced line 2
I've seen other examples like this that do use multiple patterns in a function definition, so I don't know what I'm doing wrong. Why am I getting an error here? I'm new to Frege (and new to Haskell), so there may be something simple I'm missing, but I really don't think this should be a problem.
I'm compiling with version 3.24-7.100.
This is a pure syntactical problem that affects newcomers to languages of the Haskell family. It won't take too long until you internalize the rule that function application has higher precedence than infix expression.
This has consequences:
Complex arguments of function application need parentheses.
In infix expressions, function applications on either side of the operator do not need parentheses (however, individual components of function application may still need them).
In Frege, in addition, the following rule holds:
The syntax of function application and infix expressions on the left hand side of a definition is identical to the one on the right hand side as far as lexemes allowed on both sides are concerned. (This holds in Haskell only when # and ~ are not used.)
This is so you can define an addition function like this:
data Number = Z | Succ Number
a + Z = a
a + Succ b = Succ a + b
Hence, when you apply this to your example, you see that syntactically, you're going to redefine the : operator. To achieve what you want, you need to write it thus:
foo bar ((baz, zab):foobar) = ....
-- ^ ^
This corresponds to the situation where you apply foo to a list you are constructing:
foo 42 (x:xs)
When you write
foo 42 x:xs
this means
(foo 42 x):xs

Why should I use case expressions if I can use "equations"?

I'm learning Haskell, from the book "Real World Haskell". In pages 66 and 67, they show the case expressions with this example:
fromMaybe defval wrapped =
case wrapped of
Nothing -> defval
Just value -> value
I remember a similar thing in F#, but (as shown earlier in the book) Haskell can define functions as series of equations; while AFAIK, F Sharp cannot. So I tried to define this in such a way:
fromMaybe2 defval Nothing = defval
fromMaybe2 defval (Just value) = value
I loaded it in GHCi and after a couple of results, I convinced myself it was the same However; this makes me wonder, why should there be case expressions when equations:
are more comprehensible (it's Mathematics; why use case something of, who says that?);
are less verbose (2 vs 4 lines);
require much less structuring and syntatic sugar (-> could be an operator, look what they've done!);
only use variables when needed (in basic cases, such as this wrapped just takes up space).
What's good about case expressions? Do they exist only because similar FP-based languages (such as F#) have them? Am I missing something?
Edit:
I see from #freyrs's answer that the compiler makes these exactly the same. So, equations can always be turned into case expressions (as expected). My next question is the converse; can one go the opposite route of the compiler and use equations with let/where expressions to express any case expression?
This comes from a culture of having small "kernel" expression-oriented languages. Haskell grows from Lisp's roots (i.e. lambda calculus and combinatory logic); it's basically Lisp plus syntax plus explicit data type definitions plus pattern matching minus mutation plus lazy evaluation (lazy evaluation was itself first described in Lisp AFAIK; i.e. in the 70-s).
Lisp-like languages are expression-oriented, i.e. everything is an expression, and a language's semantics is given as a set of reduction rules, turning more complex expressions into simpler ones, and ultimately into "values".
Equations are not expressions. Several equations could be somehow mashed into one expression; you'd have to introduce some syntax for that; case is that syntax.
Rich syntax of Haskell gets translated into smaller "core" language, that has case as one of its basic building blocks. case has to be a basic construct, because pattern-matching in Haskell is made to be such a basic, core feature of the language.
To your new question, yes you can, by introducing auxiliary functions as Luis Casillas shows in his answer, or with the use of pattern guards, so his example becomes:
foo x y | (Meh o p) <- z = baz y p o
| (Gah t q) <- z = quux x t q
where
z = bar x
The two functions compile into exactly the same internal code in Haskell ( called Core ) which you can dump out by passing the flags -ddump-simpl -dsuppress-all to ghc.
It may look a bit intimidating with the variable names, but it's effectively just a explicitly typed version of the code you wrote above. The only difference is the variables names.
fromMaybe2
fromMaybe2 =
\ # t_aBC defval_aB6 ds_dCK ->
case ds_dCK of _ {
Nothing -> (defval_aB6) defval_aB6;
Just value_aB8 -> (value_aB8) value_aB8
}
fromMaybe
fromMaybe =
\ # t_aBJ defval_aB3 wrapped_aB4 ->
case wrapped_aB4 of _ {
Nothing -> (defval_aB3) defval_aB3;
Just value_aB5 -> (value_aB5) value_aB5
}
The paper "A History of Haskell: Being Lazy with Class" (PDF) provides some useful perspective on this question. Section 4.4 ("Declaration style vs. expression style," p.13) is about this topic. The money quote:
[W]e engaged in furious debate about which style was “better.” An underlying assumption was that if possible there should be “just one way to do something,” so that, for example, having both let and where would be redundant and confusing. [...] In the end, we abandoned the underlying assumption, and provided full syntactic support for both styles.
Basically they couldn't agree on one so they threw both in. (Note that quote is explicitly about let and where, but they treat both that choice and the case vs. equations choice as two manifestations of the same basic choice—what they call "declaration style" vs. "expression style.")
In modern practice, the declaration style (your "series of equations") has become the more common one. case is often seen in this situation, where you need to match on a value that is computed from one of the arguments:
foo x y = case bar x of
Meh o p -> baz y p o
Gah t q -> quux x t q
You can always rewrite this to use an auxiliary function:
foo x y = go (bar x)
where go (Meh o p) = baz y p o
go (Gah t q) = quux x t q
This has the very minor disadvantage that you need to name your auxiliary function—but go is normally a perfectly fine name in this situation.
Case expression can be used anywhere an expression is expected, while equations can't. Example:
1 + (case even 9 of True -> 2; _ -> 3)
You can even nest case expression, and I've seen code that does that. However I tend to stay away from case expressions, and try to solve the problem with equations, even if I have to introduce a local function using where/let.
Every definition using equations is equivalent to one using case. For instance
negate True = False
negate False = True
stands for
negate x = case x of
True -> False
False -> True
That is to say, these are two ways of expressing the same thing, and the former is translated to the latter by GHC.
From the Haskell code that I've read, it seems canonical to use the first style wherever possible.
See section 4.4.3.1 of the Haskell '98 report.
The answer to your added question is yes, but it's pretty ugly.
case exp of
pat1 -> e1
pat2 -> e2
etc.
can, I believe, be emulated by
let
f pat1 = e1
f pat2 = e2
etc.
in f exp
as long as f is not free in exp, e1, e2, etc. But you shouldn't do that because it's horrible.

The meaning of ' in Haskell function name?

What is quote ' used for? I have read about curried functions and read two ways of defining the add function - curried and uncurried. The curried version...
myadd' :: Int -> Int -> Int
myadd' x y = x + y
...but it works equally well without the quote. So what is the point of the '?
The quote means nothing to Haskell. It is just part of the name of that function.
People tend to use this for "internal" functions. If you have a function that sums a list by using an accumulator argument, your sum function will take two args. This is ugly, so you make a sum' function of two args, and a sum function of one arg like sum list = sum' 0 list.
Edit, perhaps I should just show the code:
sum' s [] = s
sum' s (x:xs) = sum' (s + x) xs
sum xs = sum' 0 xs
You do this so that sum' is tail-recursive, and so that the "public API" is nice looking.
It is often pronounced "prime", so that would be "myadd prime". It is usually used to notate a next step in the computation, or an alternative.
So, you can say
add = blah
add' = different blah
Or
f x =
let x' = subcomputation x
in blah.
It just a habit, like using int i as the index in a for loop for Java, C, etc.
Edit: This answer is hopefully more helpful now that I've added all the words, and code formatting. :) I keep on forgetting that this is not a WYSIWYG system!
There's no particular point to the ' character in this instance; it's just part of the identifier. In other words, myadd and myadd' are distinct, unrelated functions.
Conventionally though, the ' is used to denote some logical evaluation relationship. So, hypothetical function myadd and myadd' would be related such that myadd' could be derived from myadd. This is a convention derived from formal logic and proofs in academia (where Haskell has its roots). I should underscore that this is only a convention, Haskell does not enforce it.
quote ' is just another allowed character in Haskell names. It's often used to define variants of functions, in which case quote is pronounced 'prime'. Specifically, the Haskell libraries use quote-variants to show that the variant is strict. For example: foldl is lazy, foldl' is strict.
In this case, it looks like the quote is just used to separate the curried and uncurried variants.
As said by others, the ' does not hold any meaning for Haskell itself. It is just a character, like the a letter or a number.
The ' is used to denote alternative versions of a function (in the case of foldl and foldl') or helper functions. Sometimes, you'll even see several ' on a function name. Adding a ' to the end of a function name is just much more concise than writing someFunctionHelper and someFunctionStrict.
The origin of this notation is in mathematics and physics, where, if you have a function f(x), its derivate is often denoted as f'(x).

Resources