Limitations of LL vs LR parsers? - programming-languages

I know the basic differences of LL vs LR parsers. I also know that GLR, SLR, and LALR are all extensions of LR parsers. So my question in more details is...
Given a LL(*) parser and any variation on a LR parser, is there any language that can be described in one and not the other? Or more simply is there any feature or property that can not be expressed with either?
As a concrete example. If I were to create a language using an LL(*) parser, will I ever run into desired feature/property that I might want to add to my language that would only be possible with a LR parser (or vice versa)?

Here are a couple of opinion pieces, you can consider them point and counterpoint:
Parsing ought to be easier - in favor of LL
Why I prefer LALR parsers - in favor of LR

Sure. LL parsers cannot handle any grammars with left recursion. L(AL)R(k) parsers for fixed k won't be able to parse some things that an LL(*) parser can handle, because k<*.

You might find interesting this paragraph in Wikipedia, which says, that LL(*) grammars are subset of LR(k) grammars:
http://en.wikipedia.org/wiki/Context-free_grammar#Restrictions
So you can parse more languages using LR parsing methods.

There are some grammars that cannot be "rewritten" to be parsed by LL parsers which can be parsed by LR parsers. One example: Given a simple grammar that constructs terms with substractions:
S -> S - S | num
You obviously have left-recursion here, which cannot be handled by LL-parsers. To make this grammar parsable by LL, you have to eliminate the left-recursion:
S -> num S'
S' -> - num S' | epsilon
Now your LL parser can handle this grammar. But when building up your syntax tree for a term like 4 - 2 -1, a depth-first-search operated on the tree will give you 4 - (2 - 1) = 3 instead of (4 - 2) - 1 = 3 as you would expect.
The reason for this is, that you have to use left-recursive rules in your grammar to handle left-associative operators (like substract). But left-recursive rules cannot be handled by LL-parsers.
So here you have a class of languages that cannot be handled by LL.

LR parser can accept bigger class of languages than LL. In LL(k) and LR(k), k means number of lookahead symbols that it needs to know so it can apply appropriate production/reduction. The bigger k get, the bigger are parsing tables. So k is not limiting only LR parsers, it is limiting LL parsers, too. The reason why LR parser can accept bigger class of languages is because of left recursion that are problematic when working with LL parsers. But this is not true entirely, because direct recursion are solvable, meaning that you can rewrite the grammar to be LL grammar. Direct recursion is something like A -> Abc. When you have indirect recursion, for which you now probably know how it looks, then you have a problem. LR parsers can solve this issues because they make parse tree in bottom-up manner. You will have to look little bit deeper into LR parsing to see why is that exactly. But, LR parsers are not all mighty, they have limitations, too. Some grammars are very difficult to digest and k factor does not help. For this kind of grammars GLR parser is needed, which actually simulates LR(k) parser but uses backtracking to analyze entire parse space when production/reduction ambiguity occurs.

An LR (1) parser recognizes more grammars than an LL (1) because it doesn't need us to rewrite the grammar to remove left recursion and common factors.
However, given a unambiguous grammar, we can always remove left recursion and common factors, and so in the end they can both parse the same languages.
If anyone has doubts about this, I challenge you to give an unambiguous grammar that an LL(1) cannot parse and an LR (1) can, that is, that cannot be rewritten to remove left recursion and factors common

LL parsing is theoretically O(n^4), or pretty dang slow.
LR parsing is faster, O(n^3), or pretty slow.
https://en.wikipedia.org/wiki/Top-down_parsing
Although I'd love to see proof of this.

Related

How to build Abstract Syntax Trees from grammar specification in Haskell?

I'm working on a project which involves optimizing certain constructs in a very small subset of Java, formalized in BNF.
If I were to do this in Java, I would use a combination of JTB and JavaCC which builds an AST. Visitors are then used to manipulate the tree. But, given the vast libraries for parsing in Haskell (parsec, happy, alex etc), I'm a bit confused in chossing the appropriate library.
So, simply put, when a language is specified in BNF, which library offers the easiest means to build an AST? And what is the best way to go about modifying this tree in idiomatic Haskell?
Well in Haskell there are 2 main ways of parsing something, parse combinators or a parser generator. Since you already have a BNF I'd suggest the latter.
A good one is alex. GHC's parser IIRC is written using this so you'd be in good company.
Next you'll have a big honking stack of data declarations to parse into:
data JavaClass = {
className :: Name,
interfaces :: [Name],
contents :: [ClassContents],
...
}
data ClassContents = M Method
| F Field
| IC InnerClass
and for expressions and whatever else you need. Finally you'll combine these into something like
data TopLevel = JC JavaClass
| WhateverOtherForms
| YouWillParse
Once you have this you'll have the entire AST represented as one TopLevel or a list of them depending on how many you classes/files you parse.
To proceed from here depends on what you want to do. There are a number of libraries such as syb (scrap your boilerplate) that let you write very concise tree traversals and modifications. lens is also an option. At a minimum check out Data.Traversable and Data.Foldable.
To modify the tree, you can do something as simple as
ignoreInnerClasses :: JavaClass -> JavaClass
ignoreInnerContents c = c{contents = filter isClass $ contents c}
-- ^^^ that is called a record update
where isClass (IC _) = True
isClass _ = False
and then you could potentially use something like syb to write
everywhere (mkT ignoreInnerClass) toplevel
which will traverse everything and apply ignoreInnerClass to all JavaClasses. This is possible to do in lens and many other libraries too, but syb is very easy to read.
I've never used bnfc-meta (suggested by #phg), but I would strongly recommend you look into BNFC (on hackage: http://hackage.haskell.org/package/BNFC). The basic approach is that you write your grammar in an annotated BNF style, and it will automatically generate an AST, parser, and pretty-printer for the grammar.
How suitable BNFC is depends upon the complexity of your grammar. If it's not context-free, you'll likely have a difficult time making any progress (I did make some success hacking up context-sensitive extensions, but that code's likely bit-rotted by now). The other downside is that your AST will very directly reflect the grammar specification. But since you already have a BNF specification, adding the necessary annotations for BNFC should be rather straightforward, so it's probably the fastest way to get a usable AST. Even if you decide to go a different route, you might be able to take the generated data types as a starting point for a hand-written version.
Alex + Happy.
There are many approaches to modify/investigate the parsed terms (ASTs). The keyword to search for is "datatype-generic" programming. But beware: it is a complex topic ...
http://people.cs.uu.nl/andres/Rec/MutualRec.pdf
http://www.cs.uu.nl/wiki/GenericProgramming/Multirec
It has a generic implementation of the zipper available here:
http://hackage.haskell.org/packages/archive/zipper/0.3/doc/html/Generics-MultiRec-Zipper.html
Also checkout https://github.com/pascalh/Astview
You might also check out the Haskell Compiler Series which is nice as an introduction to using alex and happy to parse a subset of Java: http://bjbell.wordpress.com/haskell-compiler-series/.
Since your grammar can be expressed in BNF, it is in the class of grammars that are efficiently parseable with a shift-reduce parser (LALR grammars). Such efficient parsers can be generated by the parser generator yacc/bison (C,C++), or its Haskell equivalent "Happy".
That's why I would use "Happy" in your case. It takes grammar rules in BNF form and generates a parser from it directly. The resulting parser will accept the language that is described by your grammar rules and produce an AST (abstract syntax tree). The Happy user guide is quite nice and gets you started quickly:
http://www.haskell.org/happy/doc/html/
To transform the resulting AST, generic programming is a good idea. Here is a classical explanation on how to do this in Haskell in a practical fashion, from scratch:
http://research.microsoft.com/en-us/um/people/simonpj/papers/hmap/
I have used exactly this to build a compiler for a small domain specific language, and it was a simple and concise solution.

Parsec vs Yacc/Bison/Antlr: Why and when to use Parsec?

I'm new to Haskell and Parsec. After reading Chapter 16 Using Parsec of Real World Haskell, a question appeared in my mind: Why and when is Parsec better than other parser generators like Yacc/Bison/Antlr?
My understanding is that Parsec creates a nice DSL of writing parsers and Haskell makes it very easy and expressive. But parsing is such a standard/popular technology that deserves its own language, which outputs to multiple target languages. So when shall we use Parsec instead of, say, generating Haskell code from Bison/Antlr?
This question might go a little beyond technology, and into the realm of industry practice. When writing a parser from scratch, what's the benefit of picking up Haskell/Parsec compared to Bison/Antlr or something similar?
BTW: my question is quite similar to this one but wasn't answered satisfactorily there.
One of the main differences between the tools you listed, is that ANTLR, Bison and their friends are parser generators, whereas Parsec is a parser combinator library.
A parser generator reads in a description of a grammar and spits out a parser. It is generally not possible to combine existing grammars into a new grammar, and it is certainly not possible to combine two existing generated parsers into a new parser.
A parser combinator OTOH does nothing but combine existing parsers into new parsers. Usually, a parser combinator library ships with a couple of trivial built-in parsers that can parse the empty string or a single character, and it ships with a set of combinators that take 1 or more parsers and return a new one that, for example, parses the sequence of the original parsers (e.g. you can combine a d parser and an o parser to form a do parser), the alternation of the original parsers (e.g. a 0 parser and a 1 parser to a 0|1 parser) or parses the original parse multiple times (repetetion).
What this means is that you could, for example, take an existing parser for Java and an existing parser for HTML and combine them into a parser for JSP.
Most parser generators don't support this, or only support it in a limited way. Parser combinators OTOH only support this and nothing else.
You might want to see this question as well as the linked one in your question.
Which Haskell parsing technology is most pleasant to use, and why?
In Haskell the competition is between Parsec (and other parser combinators) and the parser generator Happy. I'd pick Happy if I already had an LR grammar to work from - parser combinators take grammars in LL form and the translation from LR to LL takes some effort and a combinator parser will usually be significantly slower. If I don't have a grammar I'll use Parsec, it is more flexible (powerful) than Happy and its more fun to work "in Haskell" than generate code with Happy and Alex. If you use Happy for parsing you almost always need to use Alex for lexing.
For industry practice, it would be odd to decide to use Haskell just to get Parsec. For parsing, most of the current crop of languages will have at least a parser generator and probably something more flexible like a port of Parsec or a PEG system.
Ira Baxter's answer to the linked question was spot-on about a parser getting you merely to the foothold of the Himalayas for writing a translator, but being part of a translator is only one of the uses for a parser, so there are still many domains where fairly minimalist systems like ANTLR, Happy and Parsec are satisfactory.
Following on from stephen's answer, I think that one of the most common alternatives to Parsec, if you want to stick with parser combinators, is attoparsec. The main difference is that attoparsec was written with more of a bias towards speed, and makes trade-offs accordingly. For example, Parsec does some book-keeping to try to return helpful error messages if a parse fails, which attoparsec doesn't do to the same extent. Also, I think that attoparsec is specialised to one input stream/token type, whereas Parsec abstracts from the input type so that it can parse streams of type String, ByteString, Text, etc. without problem.

Write a Haskell interpreter in Haskell

A classic programming exercise is to write a Lisp/Scheme interpreter in Lisp/Scheme. The power of the full language can be leveraged to produce an interpreter for a subset of the language.
Is there a similar exercise for Haskell? I'd like to implement a subset of Haskell using Haskell as the engine. Of course it can be done, but are there any online resources available to look at?
Here's the backstory.
I am exploring the idea of using Haskell as a language to explore some of the concepts in a Discrete Structures course I am teaching. For this semester I have settled on Miranda, a smaller language that inspired Haskell. Miranda does about 90% of what I'd like it to do, but Haskell does about 2000%. :)
So my idea is to create a language that has exactly the features of Haskell that I'd like and disallows everything else. As the students progress, I can selectively "turn on" various features once they've mastered the basics.
Pedagogical "language levels" have been used successfully to teach Java and Scheme. By limiting what they can do, you can prevent them from shooting themselves in the foot while they are still mastering the syntax and concepts you are trying to teach. And you can offer better error messages.
I love your goal, but it's a big job. A couple of hints:
I've worked on GHC, and you don't want any part of the sources. Hugs is a much simpler, cleaner implementation but unfortunately it's in C.
It's a small piece of the puzzle, but Mark Jones wrote a beautiful paper called Typing Haskell in Haskell which would be a great starting point for your front end.
Good luck! Identifying language levels for Haskell, with supporting evidence from the classroom, would be of great benefit to the community and definitely a publishable result!
There is a complete Haskell parser: http://hackage.haskell.org/package/haskell-src-exts
Once you've parsed it, stripping out or disallowing certain things is easy. I did this for tryhaskell.org to disallow import statements, to support top-level definitions, etc.
Just parse the module:
parseModule :: String -> ParseResult Module
Then you have an AST for a module:
Module SrcLoc ModuleName [ModulePragma] (Maybe WarningText) (Maybe [ExportSpec]) [ImportDecl] [Decl]
The Decl type is extensive: http://hackage.haskell.org/packages/archive/haskell-src-exts/1.9.0/doc/html/Language-Haskell-Exts-Syntax.html#t%3ADecl
All you need to do is define a white-list -- of what declarations, imports, symbols, syntax is available, then walk the AST and throw a "parse error" on anything you don't want them to be aware of yet. You can use the SrcLoc value attached to every node in the AST:
data SrcLoc = SrcLoc
{ srcFilename :: String
, srcLine :: Int
, srcColumn :: Int
}
There's no need to re-implement Haskell. If you want to provide more friendly compile errors, just parse the code, filter it, send it to the compiler, and parse the compiler output. If it's a "couldn't match expected type a against inferred a -> b" then you know it's probably too few arguments to a function.
Unless you really really want to spend time implementing Haskell from scratch or messing with the internals of Hugs, or some dumb implementation, I think you should just filter what gets passed to GHC. That way, if your students want to take their code-base and take it to the next step and write some real fully fledged Haskell code, the transition is transparent.
Do you want to build your interpreter from scratch? Begin with implementing an easier functional language like the lambda calculus or a lisp variant. For the latter there is a quite nice wikibook called Write yourself a Scheme in 48 hours giving a cool and pragmatic introduction into parsing and interpretation techniques.
Interpreting Haskell by hand will be much more complex since you'll have to deal with highly complex features like typeclasses, an extremely powerful type system (type-inference!) and lazy-evaluation (reduction techniques).
So you should define a quite little subset of Haskell to work with and then maybe start by extending the Scheme-example step by step.
Addition:
Note that in Haskell, you have full access to the interpreters API (at least under GHC) including parsers, compilers and of course interpreters.
The package to use is hint (Language.Haskell.*). I have unfortunately neither found online tutorials on this nor tried it out by myself but it looks quite promising.
create a language that has exactly the features of Haskell that I'd like and disallows everything else. As the students progress, I can selectively "turn on" various features once they've mastered the basics.
I suggest a simpler (as in less work involved) solution to this problem. Instead of creating a Haskell implementation where you can turn features off, wrap a Haskell compiler with a program that first checks that the code doesn't use any feature you disallow, and then uses the ready-made compiler to compile it.
That would be similar to HLint (and also kind of its opposite):
HLint (formerly Dr. Haskell) reads Haskell programs and suggests changes that hopefully make them easier to read. HLint also makes it easy to disable unwanted suggestions, and to add your own custom suggestions.
Implement your own HLint "suggestions" to not use the features you don't allow
Disable all the standard HLint suggestions.
Make your wrapper run your modified HLint as a first step
Treat HLint suggestions as errors. That is, if HLint "complained" then the program doesn't proceed to compilation stage
Baskell is a teaching implementation, http://hackage.haskell.org/package/baskell
You might start by picking just, say, the type system to implement. That's about as complicated as an interpreter for Scheme, http://hackage.haskell.org/package/thih
The EHC series of compilers is probably the best bet: it's actively developed and seems to be exactly what you want - a series of small lambda calculi compilers/interpreters culminating in Haskell '98.
But you could also look at the various languages developed in Pierce's Types and Programming Languages, or the Helium interpreter (a crippled Haskell intended for students http://en.wikipedia.org/wiki/Helium_(Haskell)).
If you're looking for a subset of Haskell that's easy to implement, you can do away with type classes and type checking. Without type classes, you don't need type inference to evaluate Haskell code.
I wrote a self-compiling Haskell subset compiler for a Code Golf challenge. It takes Haskell subset code on input and produces C code on output. I'm sorry there isn't a more readable version available; I lifted nested definitions by hand in the process of making it self-compiling.
For a student interested in implementing an interpreter for a subset of Haskell, I would recommend starting with the following features:
Lazy evaluation. If the interpreter is in Haskell, you might not have to do anything for this.
Function definitions with pattern-matched arguments and guards. Only worry about variable, cons, nil, and _ patterns.
Simple expression syntax:
Integer literals
Character literals
[] (nil)
Function application (left associative)
Infix : (cons, right associative)
Parenthesis
Variable names
Function names
More concretely, write an interpreter that can run this:
-- tail :: [a] -> [a]
tail (_:xs) = xs
-- append :: [a] -> [a] -> [a]
append [] ys = ys
append (x:xs) ys = x : append xs ys
-- zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
zipWith f (a:as) (b:bs) = f a b : zipWith f as bs
zipWith _ _ _ = []
-- showList :: (a -> String) -> [a] -> String
showList _ [] = '[' : ']' : []
showList show (x:xs) = '[' : append (show x) (showItems show xs)
-- showItems :: (a -> String) -> [a] -> String
showItems show [] = ']' : []
showItems show (x:xs) = ',' : append (show x) (showItems show xs)
-- fibs :: [Int]
fibs = 0 : 1 : zipWith add fibs (tail fibs)
-- main :: String
main = showList showInt (take 40 fibs)
Type checking is a crucial feature of Haskell. However, going from nothing to a type-checking Haskell compiler is very difficult. If you start by writing an interpreter for the above, adding type checking to it should be less daunting.
You might look at Happy (a yacc-like parser in Haskell) which has a Haskell parser.
This might be a good idea - make a tiny version of NetLogo in Haskell. Here is the tiny interpreter.
see if helium would make a better base to build upon than standard haskell.
Uhc/Ehc is a series of compilers enabling/disabling various Haskell features.
http://www.cs.uu.nl/wiki/Ehc/WebHome#What_is_UHC_And_EHC
I've been told that Idris has a fairly compact parser, not sure if it's really suitable for alteration, but it's written in Haskell.
Andrej Bauer's Programming Language Zoo has a small implementation of a purely functional programming language somewhat cheekily named "minihaskell". It is about 700 lines of OCaml, so very easy to digest.
The site also contains toy versions of ML-style, Prolog-style and OO programming languages.
Don't you think it would be easier to take the GHC sources and strip out what you don't want, than it would be to write your own Haskell interpreter from scratch? Generally speaking, there should be a lot less effort involved in removing features as opposed to creating/adding features.
GHC is written in Haskell anyway, so technically that stays with your question of a Haskell interpreter written in Haskell.
It probably wouldn't be too hard to make the whole thing statically linked and then only distribute your customized GHCi, so that the students can't load other Haskell source modules. As to how much work it would take to prevent them from loading other Haskell object files, I have no idea. You might want to disable FFI too, if you have a bunch of cheaters in your classes :)
The reason why there are so many LISP interpreters is that LISP is basically a predecessor of JSON: a simple format to encode data. This makes the frontend part quite easy to handle. Compared to that, Haskell, especially with Language Extensions, is not the easiest language to parse.
These are some syntactical constructs that sound tricky to get right:
operators with configurable precedence, associativity, and fixity,
nested comments
layout rule
pattern syntax
do- blocks and desugaring to monadic code
Each of these, except maybe the operators, could be tackled by students after their Compiler Construction Course, but it would take the focus away from how Haskell actually works. In addition to that, you might not want to implement all syntactical constructs of Haskell directly, but instead implement passes to get rid of them. Which brings us to the literal core of the issue, pun fully intended.
My suggestion is to implement typechecking and an interpreter for Core instead of full Haskell. Both of these tasks are quite intricate by themselves already.
This language, while still a strongly typed functional language, is way less complicated to deal with in terms of optimization and code generation.
However, it is still independent from the underlying machine.
Therefore, GHC uses it as an intermediary language and translates most syntaxical constructs of Haskell into it.
Additionally, you should not shy away from using GHC's (or another compiler's) frontend.
I'd not consider that as cheating since custom LISPs use the host LISP system's parser (at least during bootstrapping). Cleaning up Core snippets and presenting them to students, along with the original code, should allow you to give an overview of what the frontend does, and why it is preferable to not reimplement it.
Here are a few links to the documentation of Core as used in GHC:
System FC: equality constraints and coercions
GHC/As a library
The Core type

Haskell or Standard ML for beginners? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm going to be teaching a lower-division course in discrete structures. I have selected the text book Discrete Structures, Logic, and Computability in part because it contains examples and concepts that are conducive to implementation with a functional programming language. (I also think it's a good textbook.)
I want an easy-to-understand FP language to illustrate DS concepts and that the students can use. Most students will have had only one or two semesters of programming in Java, at best. After looking at Scheme, Erlang, Haskell, Ocaml, and SML, I've settled on either Haskell or Standard ML. I'm leaning towards Haskell for the reasons outlined below, but I'd like the opinion of those who are active programmers in one or the other.
Both Haskell and SML have pattern matching which makes describing a recursive algorithm a cinch.
Haskell has nice list comprehensions that match nicely with the way such lists are expressed mathematically.
Haskell has lazy evaluation. Great for constructing infinite lists using the list comprehension technique.
SML has a truly interactive interpreter in which functions can be both defined and used. In Haskell, functions must be defined in a separate file and compiled before being used in the interactive shell.
SML gives explicit confirmation of the function argument and return types in a syntax that's easy to understand. For example: val foo = fn : int * int -> int. Haskell's implicit curry syntax is a bit more obtuse, but not totally alien. For example: foo :: Int -> Int -> Int.
Haskell uses arbitrary-precision integers by default. It's an external library in SML/NJ. And SML/NJ truncates output to 70 characters by default.
Haskell's lambda syntax is subtle -- it uses a single backslash. SML is more explicit. Not sure if we'll ever need lambda in this class, though.
Essentially, SML and Haskell are roughly equivalent. I lean toward Haskell because I'm loving the list comprehensions and infinite lists in Haskell. But I'm worried that the extensive number of symbols in Haskell's compact syntax might cause students problems. From what I've gathered reading other posts on SO, Haskell is not recommended for beginners starting out with FP. But we're not going to be building full-fledged applications, just trying out simple algorithms.
What do you think?
Edit: Upon reading some of your great responses, I should clarify some of my bullet points.
In SML, there's no syntactic distinction between defining a function in the interpreter and defining it in an external file. Let's say you want to write the factorial function. In Haskell you can put this definition into a file and load it into GHCi:
fac 0 = 1
fac n = n * fac (n-1)
To me, that's clear, succinct, and matches the mathematical definition in the book. But if you want to write the function in GHCi directly, you have to use a different syntax:
let fac 0 = 1; fac n = n * fac (n-1)
When working with interactive interpreters, from a teaching perspective it's very, very handy when the student can use the same code in both a file and the command line.
By "explicit confirmation of the function," I meant that upon defining the function, SML right away tells you the name of the function, the types of the arguments, and the return type. In Haskell you have to use the :type command and then you get the somewhat confusing curry notation.
One more cool thing about Haskell -- this is a valid function definition:
fac 0 = 1
fac (n+1) = (n+1) * fac n
Again, this matches a definition they might find in the textbook. Can't do that in SML!
Much as I love Haskell, here are the reasons I would prefer SML for a class in discrete math and data structures (and most other beginners' classes):
Time and space costs of Haskell programs can be very hard to predict, even for experts. SML offers much more limited ways to blow the machine.
Syntax for function defintion in an interactive interpreter is identical to syntax used in a file, so you can cut and paste.
Although operator overloading in SML is totally bogus, it is also simple. It's going to be hard to teach a whole class in Haskell without having to get into type classes.
Student can debug using print. (Although, as a commenter points out, it is possible to get almost the same effect in Haskell using Debug.Trace.trace.)
Infinite data structures blow people's minds. For beginners, you're better off having them define a stream type complete with ref cells and thunks, so they know how it works:
datatype 'a thunk_contents = UNEVALUATED of unit -> 'a
| VALUE of 'a
type 'a thunk = 'a thunk_contents ref
val delay : (unit -> 'a) -> 'a thunk
val force : 'a thunk -> 'a
Now it's not magic any more, and you can go from here to streams (infinite lists).
Layout is not as simple as in Python and can be confusing.
There are two places Haskell has an edge:
In core Haskell you can write a function's type signature just before its definition. This is hugely helpful for students and other beginners. There just isn't a nice way to deal with type signatures in SML.
Haskell has better concrete syntax. The Haskell syntax is a major improvement over ML syntax. I have written a short note about when to use parentheses in an ML program; this helps a little.
Finally, there is a sword that cuts both ways:
Haskell code is pure by default, so your students are unlikely to stumble over impure constructs (IO monad, state monad) by accident. But by the same token, they can't print, and if you want to do I/O then at minumum you have to explain do notation, and return is confusing.
On a related topic, here is some advice for your course preparation: don't overlook Purely Functional Data Structures by Chris Okasaki. Even if you don't have your students use it, you will definitely want to have a copy.
We teach Haskell to first years at our university. My feelings about this are a bit mixed. On the one hand teaching Haskell to first years means they don't have to unlearn the imperative style. Haskell can also produce very concise code which people who had some Java before can appreciate.
Some problems I've noticed students often have:
Pattern matching can be a bit difficult, at first. Students initially had some problems seeing how value construction and pattern matching are related. They also had some problems distinguishing between abstractions. Our exercises included writing functions that simplify arithmetic expression and some students had difficulty seeing the difference between the abstract representation (e.g., Const 1) and the meta-language representation (1).
Furthermore, if your students are supposed to write list processing functions themselves, be careful pointing out the difference between the patterns
[]
[x]
(x:xs)
[x:xs]
Depending on how much functional programming you want to teach them on the way, you may just give them a few library functions and let them play around with that.
We didn't teach our students about anonymous functions, we simply told them about where clauses. For some tasks this was a bit verbose, but worked well otherwise. We also didn't tell them about partial applications; this is probably quite easy to explain in Haskell (due to its form of writing types) so it might be worth showing to them.
They quickly discovered list comprehensions and preferred them over higher-order functions like filter, map, zipWith.
I think we missed out a bit on teaching them how to let them guide their thoughts by the types. I'm not quite sure, though, whether this is helpful to beginners or not.
Error messages are usually not very helpful to beginners, they might occasionally need some help with these. I haven't tried it myself, but there's a Haskell compiler specifically targeted at newcomers, mainly by means of better error messages: Helium
For the small programs, things like possible space leaks weren't an issue.
Overall, Haskell is a good teaching language, but there are a few pitfalls. Given that students feel a lot more comfortable with list comprehensions than higher-order functions, this might be the argument you need. I don't know how long your course is or how much programming you want to teach them, but do plan some time for teaching them basic concepts--they will need it.
BTW,
# SML has a truly interactive
interpreter in which functions can be
both defined and used. In Haskell,
functions must be defined in a
separate file and compiled before
being used in the interactive shell.
Is inaccurate. Use GHCi:
Prelude> let f x = x ^ 2
Prelude> f 7
49
Prelude> f 2
4
There are also good resources for Haskell in education on the haskell.org edu. page, with experiences from different teachers. http://haskell.org/haskellwiki/Haskell_in_education
Finally, you'll be able to teach them multicore parallelism just for fun, if you use Haskell :-)
Many universities teach Haskell as a first functional language or even a first programming language, so I don't think this will be a problem.
Having done some of the teaching on one such course, I don't agree that the possible confusions you identify are that likely. The most likely sources of early confusion are parsing errors caused by bad layout, and mysterious messages about type classes when numeric literals are used incorrectly.
I'd also disagree with any suggestion that Haskell is not recommended for beginners starting out with FP. It's certainly the big bang approach in ways that strict languages with mutation aren't, but I think that's a very valid approach.
SML has a truly interactive interpreter in which functions can be both defined and used. In Haskell, functions must be defined in a separate file and compiled before being used in the interactive shell.
While Hugs may have that limitation, GHCi does not:
$ ghci
GHCi, version 6.10.1: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.
Prelude> let hello name = "Hello, " ++ name
Prelude> hello "Barry"
"Hello, Barry"
There's many reasons I prefer GHC(i) over Hugs, this is just one of them.
SML gives explicit confirmation of the function argument and return types in a syntax that's easy to understand. For example: val foo = fn : int * int -> int. Haskell's implicit curry syntax is a bit more obtuse, but not totally alien. For example: foo :: Int -> Int -> Int.
SML has what you call "implicit curry" syntax as well.
$ sml
Standard ML of New Jersey v110.69 [built: Fri Mar 13 16:02:47 2009]
- fun add x y = x + y;
val add = fn : int -> int -> int
Essentially, SML and Haskell are roughly equivalent. I lean toward Haskell because I'm loving the list comprehensions and infinite lists in Haskell. But I'm worried that the extensive number of symbols in Haskell's compact syntax might cause students problems. From what I've gathered reading other posts on SO, Haskell is not recommended for beginners starting out with FP. But we're not going to be building full-fledged applications, just trying out simple algorithms.
I like using Haskell much more than SML, but I would still teach SML first.
Seconding nominolo's thoughts, list comprehensions do seem to slow students from getting to some higher-order functions.
If you want laziness and infinite lists, it's instructive to implement it explicitly.
Because SML is eagerly evaluated, the execution model is far easier to comprehend, and "debugging via printf" works a lot better than in Haskell.
SML's type system is also simpler. While your class likely wouldn't use them anyways, Haskell's typeclasses are still an extra bump to get over -- getting them to understand the 'a versus ''a distinction in SML is tough enough.
Most answers were technical, but I think you should consider at least one that is not: Haskell (as OCaml), at this time, has a bigger community using it in a wider range of contexts. There's also a big database of libraries and applications written for profit and fun at Hackage. That may be an important factor in keeping some of your students using the language after your course is finished, and maybe trying other functional languages (like Standard ML) later.
I am amazed you are not considering OCaml and F# given that they address so many of your concerns. Surely decent and helpful development environments are a high priority for learners? SML is way behind and F# is way ahead of all other FPLs in that respect.
Also, both OCaml and F# have list comprehensions.
Haskell. I'm ahead in my algos/theory class in CS because of the stuff I learned from using Haskell. It's such a comprehensive language, and it will teach you a ton of CS, just by using it.
However, SML is much easier to learn. Haskell has features such as lazy evaluation and control structures that make it much more powerful, but with the cost of a steep(ish) learning curve. SML has no such curve.
That said, most of Haskell was unlearning stuff from less scientific/mathematic languages such as Ruby, ObjC, or Python.

What's the fuss about Haskell? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I know a few programmers who keep talking about Haskell when they are among themselves, and here on SO everyone seems to love that language. Being good at Haskell seems somewhat like the hallmark of a genius programmer.
Can someone give a few Haskell examples that show why it is so elegant / superior?
This is the example that convinced me to learn Haskell (and boy am I glad I did).
-- program to copy a file --
import System.Environment
main = do
--read command-line arguments
[file1, file2] <- getArgs
--copy file contents
str <- readFile file1
writeFile file2 str
OK, it's a short, readable program. In that sense it's better than a C program. But how is this so different from (say) a Python program with a very similar structure?
The answer is lazy evaluation. In most languages (even some functional ones), a program structured like the one above would result in the entire file being loaded into memory, and then written out again under a new name.
Haskell is "lazy". It doesn't calculate things until it needs to, and by extension doesn't calculate things it never needs. For instance, if you were to remove the writeFile line, Haskell wouldn't bother reading anything from the file in the first place.
As it is, Haskell realises that the writeFile depends on the readFile, and so is able to optimise this data path.
While the results are compiler-dependent, what will typically happen when you run the above program is this: the program reads a block (say 8KB) of the first file, then writes it to the second file, then reads another block from the first file, and writes it to the second file, and so on. (Try running strace on it!)
... which looks a lot like what the efficient C implementation of a file copy would do.
So, Haskell lets you write compact, readable programs - often without sacrificing a lot of performance.
Another thing I must add is that Haskell simply makes it difficult to write buggy programs. The amazing type system, lack of side-effects, and of course the compactness of Haskell code reduces bugs for at least three reasons:
Better program design. Reduced complexity leads to fewer logic errors.
Compact code. Fewer lines for bugs to exist on.
Compile errors. Lots of bugs just aren't valid Haskell.
Haskell isn't for everyone. But everyone should give it a try.
The way it was pitched to me, and what I think is true after having worked on learning on Haskell for a month now, is the fact that functional programming twists your brain in interesting ways: it forces you to think about familiar problems in different ways: instead of loops, think in maps and folds and filters, etc. In general, if you have more than one perspective on a problem, it makes you better enabled to reason about this problem, and switch viewpoints as necessary.
The other really neat thing about Haskell is its type system. It's strictly typed, but the type inference engine makes it feel like a Python program that magically tells you when you've done a stupid type-related mistake. Haskell's error messages in this regard are somewhat lacking, but as you get more acquainted with the language you'll say to yourself: this is what typing is supposed to be!
You are kind of asking the wrong question.
Haskell is not a language where you go look at a few cool examples and go "aha, I see now, that's what makes it good!"
It's more like, we have all these other programming languages, and they're all more or less similar, and then there's Haskell which is totally different and wacky in a way that's totally awesome once you get used to the wackiness. But the problem is, it takes quite a while to acclimate to the wackiness. Things that set Haskell apart from almost any other even-semi-mainstream language:
Lazy evaluation
No side effects (everything is pure, IO/etc happens via monads)
Incredibly expressive static type system
as well as some other aspects that are different from many mainstream languages (but shared by some):
functional
significant whitespace
type inferred
As some other posters have answered, the combination of all these features means that you think about programming in an entirely different way. And so it's hard to come up with an example (or set of examples) that adequately communicates this to Joe-mainstream-programmer. It's an experiential thing. (To make an analogy, I can show you photos of my 1970 trip to China, but after seeing the photos, you still won't know what it was like to have lived there during that time. Similarly, I can show you a Haskell 'quicksort', but you still won't know what it means to be a Haskeller.)
What really sets Haskell apart is the effort it goes to in its design to enforce functional programming. You can program in a functional style in pretty much any language, but it's all too easy to abandon at the first convenience. Haskell does not allow you to abandon functional programming, so you must take it to its logical conclusion, which is a final program that is easier to reason about, and sidesteps a whole class of the thorniest types of bugs.
When it comes to writing a program for real world use, you may find Haskell lacking in some practical fashion, but your final solution will be better for having known Haskell to begin with. I'm definitely not there yet, but so far learning Haskell has been much more enlightening than say, Lisp was in college.
Part of the fuss is that purity and static typing enable for parallelism combined with aggressive optimisations. Parallel languages are hot now with multicore being a bit disruptive.
Haskell gives you more options for parallelism than pretty much any general purpose language, along with a fast, native code compiler. There is really no competition with this kind of support for parallel styles:
semi-implicit parallelism via thread sparks
explicit threads
data parallel arrays
actors and message passing
transactional memory
So if you care about making your multicore work, Haskell has something to say.
A great place to start is with Simon Peyton Jones' tutorial on parallel and concurrent programming in Haskell.
I've spent the last year learning Haskell and writing a reasonably large and complex project in it. (The project is an automated options trading system, and everything from the trading algorithms to the parsing and handling of low-level, high-speed market data feeds is done in Haskell.) It's considerably more concise and easier to understand (for those with appropriate background) than a Java version would be, as well as extremely robust.
Possibly the biggest win for me has been the ability to modularize control flow through things such as monoids, monads, and so on. A very simple example would be the Ordering monoid; in an expression such as
c1 `mappend` c2 `mappend` c3
where c1 and so on return LT, EQ or GT, c1 returning EQ causes the expression to continue, evaluating c2; if c2 returns LT or GT that's the value of the whole, and c3 is not evaluated. This sort of thing gets considerably more sophisticated and complex in things like monadic message generators and parsers where I may be carrying around different types of state, have varying abort conditions, or may want to be able to decide for any particular call whether abort really means "no further processing" or means, "return an error at the end, but carry on processing to collect further error messages."
This is all stuff it takes some time and probably quite some effort to learn, and thus it can be hard to make a convincing argument for it for those who don't already know these techniques. I think that the All About Monads tutorial gives a pretty impressive demonstration of one facet of this, but I wouldn't expect that anybody not familiar with the material already would "get it" on the first, or even the third, careful reading.
Anyway, there's lots of other good stuff in Haskell as well, but this is a major one that I don't see mentioned so often, probably because it's rather complex.
Software Transactional Memory is a pretty cool way to deal with concurrency. It's much more flexible than message passing, and not deadlock prone like mutexes. GHC's implementation of STM is considered one of the best.
For an interesting example you can look at:
http://en.literateprograms.org/Quicksort_(Haskell)
What is interesting is to look at the implementation in various languages.
What makes Haskell so interesting, along with other functional languages, is the fact that you have to think differently about how to program. For example, you will generally not use for or while loops, but will use recursion.
As is mentioned above, Haskell and other functional languages excel with parallel processing and writing applications to work on multi-cores.
I couldn't give you an example, I'm an OCaml guy, but when I'm in such a situation as yourself, curiosity just takes hold and I have to download a compiler/interpreter and give it a go. You'll likely learn far more that way about the strengths and weaknesses of a given functional language.
One thing I find very cool when dealing with algorithms or mathematical problems is Haskell's inherent lazy evaluation of computations, which is only possible due to its strict functional nature.
For example, if you want to calculate all primes, you could use
primes = sieve [2..]
where sieve (p:xs) = p : sieve [x | x<-xs, x `mod` p /= 0]
and the result is actually an infinite list. But Haskell will evaluate it left from right, so as long as you don't try to do something that requires the entire list, you can can still use it without the program getting stuck in infinity, such as:
foo = sum $ takeWhile (<100) primes
which sums all primes less than 100. This is nice for several reasons. First of all, I only need to write one prime function that generates all primes and then I'm pretty much ready to work with primes. In an object-oriented programming language, I would need some way to tell the function how many primes it should compute before returning, or emulate the infinite list behavior with an object. Another thing is that in general, you end up writing code that expresses what you want to compute and not in which order to evaluate things - instead the compiler does that for you.
This is not only useful for infinite lists, in fact it gets used without you knowing it all the time when there is no need to evaluate more than necessary.
I find that for certain tasks I am incredibly productive with Haskell.
The reason is because of the succinct syntax and the ease of testing.
This is what the function declaration syntax is like:
foo a = a + 5
That's is simplest way I can think of defining a function.
If I write the inverse
inverseFoo a = a - 5
I can check that it is an inverse for any random input by writing
prop_IsInverse :: Double -> Bool
prop_IsInverse a = a == (inverseFoo $ foo a)
And calling from the command line
jonny#ubuntu: runhaskell quickCheck +names fooFileName.hs
Which will check that all the properties in my file are held, by randomly testing inputs a hundred times of so.
I don't think Haskell is the perfect language for everything, but when it comes to writing little functions and testing, I haven't seen anything better. If your programming has a mathematical component this is very important.
I agree with others that seeing a few small examples is not the best way to show off Haskell. But I'll give some anyway. Here's a lightning-fast solution to Euler Project problems 18 and 67, which ask you to find the maximum-sum path from the base to the apex of a triangle:
bottomUp :: (Ord a, Num a) => [[a]] -> a
bottomUp = head . bu
where bu [bottom] = bottom
bu (row : base) = merge row $ bu base
merge [] [_] = []
merge (x:xs) (y1:y2:ys) = x + max y1 y2 : merge xs (y2:ys)
Here is a complete, reusable implementation of the BubbleSearch algorithm by Lesh and Mitzenmacher. I used it to pack large media files for archival storage on DVD with no waste:
data BubbleResult i o = BubbleResult { bestResult :: o
, result :: o
, leftoverRandoms :: [Double]
}
bubbleSearch :: (Ord result) =>
([a] -> result) -> -- greedy search algorithm
Double -> -- probability
[a] -> -- list of items to be searched
[Double] -> -- list of random numbers
[BubbleResult a result] -- monotone list of results
bubbleSearch search p startOrder rs = bubble startOrder rs
where bubble order rs = BubbleResult answer answer rs : walk tries
where answer = search order
tries = perturbations p order rs
walk ((order, rs) : rest) =
if result > answer then bubble order rs
else BubbleResult answer result rs : walk rest
where result = search order
perturbations :: Double -> [a] -> [Double] -> [([a], [Double])]
perturbations p xs rs = xr' : perturbations p xs (snd xr')
where xr' = perturb xs rs
perturb :: [a] -> [Double] -> ([a], [Double])
perturb xs rs = shift_all p [] xs rs
shift_all p new' [] rs = (reverse new', rs)
shift_all p new' old rs = shift_one new' old rs (shift_all p)
where shift_one :: [a] -> [a] -> [Double] -> ([a]->[a]->[Double]->b) -> b
shift_one new' xs rs k = shift new' [] xs rs
where shift new' prev' [x] rs = k (x:new') (reverse prev') rs
shift new' prev' (x:xs) (r:rs)
| r <= p = k (x:new') (prev' `revApp` xs) rs
| otherwise = shift new' (x:prev') xs rs
revApp xs ys = foldl (flip (:)) ys xs
I'm sure this code looks like random gibberish. But if you read Mitzenmacher's blog entry and understand the algorithm, you'll be amazed that it's possible to package the algorithm into code without saying anything about what you're searching for.
Having given you some examples as you asked for, I will say that the best way to start to appreciate Haskell is to read the paper that gave me the ideas I needed to write the DVD packer: Why Functional Programming Matters by John Hughes. The paper actually predates Haskell, but it brilliantly explains some of the ideas that make people like Haskell.
For me, the attraction of Haskell is the promise of compiler guaranteed correctness. Even if it is for pure parts of the code.
I have written a lot of scientific simulation code, and have wondered so many times if there was a bug in my prior codes, which could invalidate a lot of current work.
it has no loop constructs. not many languages have this trait.
If you can wrap your head around the type system in Haskell I think that in itself is quite an accomplishment.
I agree with those that said that functional programming twists your brain into seeing programming from a different angle. I've only used it as a hobbyist, but I think it fundamentally changed the way I approach a problem. I don't think I would have been nearly as effective with LINQ without having been exposed to Haskell (and using generators and list comprehensions in Python).
To air a contrarian view: Steve Yegge writes that Hindely-Milner languages lack the flexibility required to write good systems:
H-M is very pretty, in a totally
useless formal mathematical sense. It
handles a few computation constructs
very nicely; the pattern matching
dispatch found in Haskell, SML and
OCaml is particularly handy.
Unsurprisingly, it handles some other
common and highly desirable constructs
awkwardly at best, but they explain
those scenarios away by saying that
you're mistaken, you don't actually
want them. You know, things like, oh,
setting variables.
Haskell is worth learning, but it has its own weaknesses.

Resources