A classic programming exercise is to write a Lisp/Scheme interpreter in Lisp/Scheme. The power of the full language can be leveraged to produce an interpreter for a subset of the language.
Is there a similar exercise for Haskell? I'd like to implement a subset of Haskell using Haskell as the engine. Of course it can be done, but are there any online resources available to look at?
Here's the backstory.
I am exploring the idea of using Haskell as a language to explore some of the concepts in a Discrete Structures course I am teaching. For this semester I have settled on Miranda, a smaller language that inspired Haskell. Miranda does about 90% of what I'd like it to do, but Haskell does about 2000%. :)
So my idea is to create a language that has exactly the features of Haskell that I'd like and disallows everything else. As the students progress, I can selectively "turn on" various features once they've mastered the basics.
Pedagogical "language levels" have been used successfully to teach Java and Scheme. By limiting what they can do, you can prevent them from shooting themselves in the foot while they are still mastering the syntax and concepts you are trying to teach. And you can offer better error messages.
I love your goal, but it's a big job. A couple of hints:
I've worked on GHC, and you don't want any part of the sources. Hugs is a much simpler, cleaner implementation but unfortunately it's in C.
It's a small piece of the puzzle, but Mark Jones wrote a beautiful paper called Typing Haskell in Haskell which would be a great starting point for your front end.
Good luck! Identifying language levels for Haskell, with supporting evidence from the classroom, would be of great benefit to the community and definitely a publishable result!
There is a complete Haskell parser: http://hackage.haskell.org/package/haskell-src-exts
Once you've parsed it, stripping out or disallowing certain things is easy. I did this for tryhaskell.org to disallow import statements, to support top-level definitions, etc.
Just parse the module:
parseModule :: String -> ParseResult Module
Then you have an AST for a module:
Module SrcLoc ModuleName [ModulePragma] (Maybe WarningText) (Maybe [ExportSpec]) [ImportDecl] [Decl]
The Decl type is extensive: http://hackage.haskell.org/packages/archive/haskell-src-exts/1.9.0/doc/html/Language-Haskell-Exts-Syntax.html#t%3ADecl
All you need to do is define a white-list -- of what declarations, imports, symbols, syntax is available, then walk the AST and throw a "parse error" on anything you don't want them to be aware of yet. You can use the SrcLoc value attached to every node in the AST:
data SrcLoc = SrcLoc
{ srcFilename :: String
, srcLine :: Int
, srcColumn :: Int
}
There's no need to re-implement Haskell. If you want to provide more friendly compile errors, just parse the code, filter it, send it to the compiler, and parse the compiler output. If it's a "couldn't match expected type a against inferred a -> b" then you know it's probably too few arguments to a function.
Unless you really really want to spend time implementing Haskell from scratch or messing with the internals of Hugs, or some dumb implementation, I think you should just filter what gets passed to GHC. That way, if your students want to take their code-base and take it to the next step and write some real fully fledged Haskell code, the transition is transparent.
Do you want to build your interpreter from scratch? Begin with implementing an easier functional language like the lambda calculus or a lisp variant. For the latter there is a quite nice wikibook called Write yourself a Scheme in 48 hours giving a cool and pragmatic introduction into parsing and interpretation techniques.
Interpreting Haskell by hand will be much more complex since you'll have to deal with highly complex features like typeclasses, an extremely powerful type system (type-inference!) and lazy-evaluation (reduction techniques).
So you should define a quite little subset of Haskell to work with and then maybe start by extending the Scheme-example step by step.
Addition:
Note that in Haskell, you have full access to the interpreters API (at least under GHC) including parsers, compilers and of course interpreters.
The package to use is hint (Language.Haskell.*). I have unfortunately neither found online tutorials on this nor tried it out by myself but it looks quite promising.
create a language that has exactly the features of Haskell that I'd like and disallows everything else. As the students progress, I can selectively "turn on" various features once they've mastered the basics.
I suggest a simpler (as in less work involved) solution to this problem. Instead of creating a Haskell implementation where you can turn features off, wrap a Haskell compiler with a program that first checks that the code doesn't use any feature you disallow, and then uses the ready-made compiler to compile it.
That would be similar to HLint (and also kind of its opposite):
HLint (formerly Dr. Haskell) reads Haskell programs and suggests changes that hopefully make them easier to read. HLint also makes it easy to disable unwanted suggestions, and to add your own custom suggestions.
Implement your own HLint "suggestions" to not use the features you don't allow
Disable all the standard HLint suggestions.
Make your wrapper run your modified HLint as a first step
Treat HLint suggestions as errors. That is, if HLint "complained" then the program doesn't proceed to compilation stage
Baskell is a teaching implementation, http://hackage.haskell.org/package/baskell
You might start by picking just, say, the type system to implement. That's about as complicated as an interpreter for Scheme, http://hackage.haskell.org/package/thih
The EHC series of compilers is probably the best bet: it's actively developed and seems to be exactly what you want - a series of small lambda calculi compilers/interpreters culminating in Haskell '98.
But you could also look at the various languages developed in Pierce's Types and Programming Languages, or the Helium interpreter (a crippled Haskell intended for students http://en.wikipedia.org/wiki/Helium_(Haskell)).
If you're looking for a subset of Haskell that's easy to implement, you can do away with type classes and type checking. Without type classes, you don't need type inference to evaluate Haskell code.
I wrote a self-compiling Haskell subset compiler for a Code Golf challenge. It takes Haskell subset code on input and produces C code on output. I'm sorry there isn't a more readable version available; I lifted nested definitions by hand in the process of making it self-compiling.
For a student interested in implementing an interpreter for a subset of Haskell, I would recommend starting with the following features:
Lazy evaluation. If the interpreter is in Haskell, you might not have to do anything for this.
Function definitions with pattern-matched arguments and guards. Only worry about variable, cons, nil, and _ patterns.
Simple expression syntax:
Integer literals
Character literals
[] (nil)
Function application (left associative)
Infix : (cons, right associative)
Parenthesis
Variable names
Function names
More concretely, write an interpreter that can run this:
-- tail :: [a] -> [a]
tail (_:xs) = xs
-- append :: [a] -> [a] -> [a]
append [] ys = ys
append (x:xs) ys = x : append xs ys
-- zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
zipWith f (a:as) (b:bs) = f a b : zipWith f as bs
zipWith _ _ _ = []
-- showList :: (a -> String) -> [a] -> String
showList _ [] = '[' : ']' : []
showList show (x:xs) = '[' : append (show x) (showItems show xs)
-- showItems :: (a -> String) -> [a] -> String
showItems show [] = ']' : []
showItems show (x:xs) = ',' : append (show x) (showItems show xs)
-- fibs :: [Int]
fibs = 0 : 1 : zipWith add fibs (tail fibs)
-- main :: String
main = showList showInt (take 40 fibs)
Type checking is a crucial feature of Haskell. However, going from nothing to a type-checking Haskell compiler is very difficult. If you start by writing an interpreter for the above, adding type checking to it should be less daunting.
You might look at Happy (a yacc-like parser in Haskell) which has a Haskell parser.
This might be a good idea - make a tiny version of NetLogo in Haskell. Here is the tiny interpreter.
see if helium would make a better base to build upon than standard haskell.
Uhc/Ehc is a series of compilers enabling/disabling various Haskell features.
http://www.cs.uu.nl/wiki/Ehc/WebHome#What_is_UHC_And_EHC
I've been told that Idris has a fairly compact parser, not sure if it's really suitable for alteration, but it's written in Haskell.
Andrej Bauer's Programming Language Zoo has a small implementation of a purely functional programming language somewhat cheekily named "minihaskell". It is about 700 lines of OCaml, so very easy to digest.
The site also contains toy versions of ML-style, Prolog-style and OO programming languages.
Don't you think it would be easier to take the GHC sources and strip out what you don't want, than it would be to write your own Haskell interpreter from scratch? Generally speaking, there should be a lot less effort involved in removing features as opposed to creating/adding features.
GHC is written in Haskell anyway, so technically that stays with your question of a Haskell interpreter written in Haskell.
It probably wouldn't be too hard to make the whole thing statically linked and then only distribute your customized GHCi, so that the students can't load other Haskell source modules. As to how much work it would take to prevent them from loading other Haskell object files, I have no idea. You might want to disable FFI too, if you have a bunch of cheaters in your classes :)
The reason why there are so many LISP interpreters is that LISP is basically a predecessor of JSON: a simple format to encode data. This makes the frontend part quite easy to handle. Compared to that, Haskell, especially with Language Extensions, is not the easiest language to parse.
These are some syntactical constructs that sound tricky to get right:
operators with configurable precedence, associativity, and fixity,
nested comments
layout rule
pattern syntax
do- blocks and desugaring to monadic code
Each of these, except maybe the operators, could be tackled by students after their Compiler Construction Course, but it would take the focus away from how Haskell actually works. In addition to that, you might not want to implement all syntactical constructs of Haskell directly, but instead implement passes to get rid of them. Which brings us to the literal core of the issue, pun fully intended.
My suggestion is to implement typechecking and an interpreter for Core instead of full Haskell. Both of these tasks are quite intricate by themselves already.
This language, while still a strongly typed functional language, is way less complicated to deal with in terms of optimization and code generation.
However, it is still independent from the underlying machine.
Therefore, GHC uses it as an intermediary language and translates most syntaxical constructs of Haskell into it.
Additionally, you should not shy away from using GHC's (or another compiler's) frontend.
I'd not consider that as cheating since custom LISPs use the host LISP system's parser (at least during bootstrapping). Cleaning up Core snippets and presenting them to students, along with the original code, should allow you to give an overview of what the frontend does, and why it is preferable to not reimplement it.
Here are a few links to the documentation of Core as used in GHC:
System FC: equality constraints and coercions
GHC/As a library
The Core type
Related
I'm working on a project which involves optimizing certain constructs in a very small subset of Java, formalized in BNF.
If I were to do this in Java, I would use a combination of JTB and JavaCC which builds an AST. Visitors are then used to manipulate the tree. But, given the vast libraries for parsing in Haskell (parsec, happy, alex etc), I'm a bit confused in chossing the appropriate library.
So, simply put, when a language is specified in BNF, which library offers the easiest means to build an AST? And what is the best way to go about modifying this tree in idiomatic Haskell?
Well in Haskell there are 2 main ways of parsing something, parse combinators or a parser generator. Since you already have a BNF I'd suggest the latter.
A good one is alex. GHC's parser IIRC is written using this so you'd be in good company.
Next you'll have a big honking stack of data declarations to parse into:
data JavaClass = {
className :: Name,
interfaces :: [Name],
contents :: [ClassContents],
...
}
data ClassContents = M Method
| F Field
| IC InnerClass
and for expressions and whatever else you need. Finally you'll combine these into something like
data TopLevel = JC JavaClass
| WhateverOtherForms
| YouWillParse
Once you have this you'll have the entire AST represented as one TopLevel or a list of them depending on how many you classes/files you parse.
To proceed from here depends on what you want to do. There are a number of libraries such as syb (scrap your boilerplate) that let you write very concise tree traversals and modifications. lens is also an option. At a minimum check out Data.Traversable and Data.Foldable.
To modify the tree, you can do something as simple as
ignoreInnerClasses :: JavaClass -> JavaClass
ignoreInnerContents c = c{contents = filter isClass $ contents c}
-- ^^^ that is called a record update
where isClass (IC _) = True
isClass _ = False
and then you could potentially use something like syb to write
everywhere (mkT ignoreInnerClass) toplevel
which will traverse everything and apply ignoreInnerClass to all JavaClasses. This is possible to do in lens and many other libraries too, but syb is very easy to read.
I've never used bnfc-meta (suggested by #phg), but I would strongly recommend you look into BNFC (on hackage: http://hackage.haskell.org/package/BNFC). The basic approach is that you write your grammar in an annotated BNF style, and it will automatically generate an AST, parser, and pretty-printer for the grammar.
How suitable BNFC is depends upon the complexity of your grammar. If it's not context-free, you'll likely have a difficult time making any progress (I did make some success hacking up context-sensitive extensions, but that code's likely bit-rotted by now). The other downside is that your AST will very directly reflect the grammar specification. But since you already have a BNF specification, adding the necessary annotations for BNFC should be rather straightforward, so it's probably the fastest way to get a usable AST. Even if you decide to go a different route, you might be able to take the generated data types as a starting point for a hand-written version.
Alex + Happy.
There are many approaches to modify/investigate the parsed terms (ASTs). The keyword to search for is "datatype-generic" programming. But beware: it is a complex topic ...
http://people.cs.uu.nl/andres/Rec/MutualRec.pdf
http://www.cs.uu.nl/wiki/GenericProgramming/Multirec
It has a generic implementation of the zipper available here:
http://hackage.haskell.org/packages/archive/zipper/0.3/doc/html/Generics-MultiRec-Zipper.html
Also checkout https://github.com/pascalh/Astview
You might also check out the Haskell Compiler Series which is nice as an introduction to using alex and happy to parse a subset of Java: http://bjbell.wordpress.com/haskell-compiler-series/.
Since your grammar can be expressed in BNF, it is in the class of grammars that are efficiently parseable with a shift-reduce parser (LALR grammars). Such efficient parsers can be generated by the parser generator yacc/bison (C,C++), or its Haskell equivalent "Happy".
That's why I would use "Happy" in your case. It takes grammar rules in BNF form and generates a parser from it directly. The resulting parser will accept the language that is described by your grammar rules and produce an AST (abstract syntax tree). The Happy user guide is quite nice and gets you started quickly:
http://www.haskell.org/happy/doc/html/
To transform the resulting AST, generic programming is a good idea. Here is a classical explanation on how to do this in Haskell in a practical fashion, from scratch:
http://research.microsoft.com/en-us/um/people/simonpj/papers/hmap/
I have used exactly this to build a compiler for a small domain specific language, and it was a simple and concise solution.
I'm struggling with what Super Combinators are:
A supercombinator is either a constant, or a combinator which contains only supercombinators as subexpressions.
And also with what Constant Applicative Forms are:
Any super combinator which is not a lambda abstraction. This includes truly constant expressions such as 12, ((+) 1 2), [1,2,3] as well as partially applied functions such as ((+) 4). Note that this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
This is just not making any sense to me! Isn't ((+) 4) just as "truly constant" as 12? CAFs sound like values to my simple mind.
These Haskell wiki pages you reference are old, and I think unfortunately written. Particularly unfortunate is that they mix up CAFs and supercombinators. Supercombinators are interesting but unrelated to GHC. CAFs are still very much a part of GHC, and can be understood without reference to supercombinators.
So let's start with supercombinators. Combinators derive from combinatory logic, and, in the usage here, consist of functions which only apply the values passed in to one another in one or another form -- i.e. they combine their arguments. The most famous set of combinators are S, K, and I, which taken together are Turing-complete. Supercombinators, in this context, are functions built only of values passed in, combinators, and other supercombinators. Hence any supercombinator can be expanded, through substitution, into a plain old combinator.
Some compilers for functional languages (not GHC!) use combinators and supercombinators as intermediate steps in compilation. As with any similar compiler technology, the reason for doing this is to admit optimization analysis that is more easily performed in such a simplified, minimal language. One such core language built on supercombinators is Edwin Brady's epic.
Constant Applicative Forms are something else entirely. They're a bit more subtle, and have a few gotchas. The way to think of them is as an aspect of compiler implementation with no separate semantic meaning but with a potentially profound effect on runtime performance. The following may not be a perfect description of a CAF, but it'll try to convey my intuition of what one is, since I haven't seen a really good description anywhere else for me to crib from. The clean "authoritative" description in the GHC Commentary Wiki reads as follows:
Constant Applicative Forms, or CAFs for short, are top-level values
defined in a program. Essentially, they are objects that are not
allocated dynamically at run-time but, instead, are part of the static
data of the program.
That's a good start. Pure, functional, lazy languages can be thought of in some sense as a graph reduction machine. The first time you demand the value of a node, that forces its evaluation, which in turn can demand the values of subnodes, etc. One a node is evaluated, the resultant value sticks around (although it does not have to stick around -- since this is a pure language we could always keep the subnodes live and recalculate with no semantic effect). A CAF is indeed just a value. But, in the context, a special kind of value -- one which the compiler can determine has a meaning entirely dependent on its subnodes. That is to say:
foo x = ...
where thisIsACaf = [1..10::Int]
thisIsNotACaf = [1..x::Int]
thisIsAlsoNotACaf :: Num a => [a]
thisIsAlsoNotACaf = [1..10] -- oops, polymorphic! the "num" dictionary is implicitly a parameter.
thisCouldBeACaf = const [1..10::Int] x -- requires a sufficiently smart compiler
thisAlsoCouldBeACaf _ = [1..10::Int] -- also requires a sufficiently smart compiler
So why do we care if things are CAFs? Basically because sometimes we really really don't want to recompute something (for example, a memotable!) and so want to make sure it is shared properly. Other times we really do want to recompute something (e.g. a huge boring easy to generate list -- such as the naturals -- which we're just walking over) and not have it stick around in memory forever. A combination of naming things and binding them under lets or writing them inline, etc. typically lets us specify these sorts of things in a natural, intuitive way. Occasionally, however, the compiler is smarter or dumber than we expect, and something we think should only be computed once is always recomputed, or something we don't want to hang on to gets lifted out as a CAF. Then, we need to think things through more carefully. See this discussion to get an idea about some of the trickiness involved: A good way to avoid "sharing"?
[By the way, I don't feel up to it, but anyone that wants to should feel free to take as much of this answer as they want to try and integrate it with the existing Haskell Wiki pages and improve/update them]
Matt is right in that the definition is confusing. It is even contradictory. A CAF is defined as:
Any super combinator which is not a lambda abstraction. This includes
truly constant expressions such as 12, ((+) 1 2), [1,2,3] as
well as partially applied functions such as ((+) 4).
Hence, ((+) 4) is seen as a CAF. But in the very next sentence we're told it is equivalent to something that is not a CAF:
this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
It would be cleaner to rule out partially applied functions on the ground that they are equivalent to lambda abstractions.
This would be useful for genetic programming, which usually use a Lisp subset as representation for programs.
I've found something called Liskell (Lisp syntax, Haskell inside) on the web, but the links are broken and I can't find the paper on it...
Check out Lisk, which was designed to fix the author's gripes with Liskell.
In my spare time I’m working on a project called Lisk. Using the -pgmF option for GHC, you can provide GHC a program name that is called to preprocess the file before GHC compiles it. It also works in GHCi and imports. You use it like this:
{-# OPTIONS -F -pgmF lisk #-}
(module fibs
(import system.environment)
(:: main (io ()))
(= main (>>= get-args (. print fib read head)))
(:: test (-> :string (, :int :string)))
(= test (, 1))
(:: fib (-> :int :int))
(= fib 0 0)
(= fib 1 1)
(= fib n (+ (fib (- n 1))
(fib (- n 2)))))
The source is here.
Also, if you don't actually care about Haskell and just want some of its features, you might want to check out Qi (or its successor, Shen), which has s-expression syntax with many modern functional-programming features similar to Haskell.
You might be interested in a project I have been working on, husk scheme.
Basically it will let you call into Scheme code (S-expressions) from Haskell, and vice-versa. So you can intermingle that code within your program, and then process the s-expressions as native Haskell data types when you want to do something on the Haskell side.
Anyway, it may be useful to you, or not - have a look and decide for yourself.
In most genetic programming software, programs are represented as abstract syntax trees (AST) which are evaluated directly in that form. The Lisp S-expression syntax is only apparent when the programs are output as source code. Have you considered just modifying the output module in your chosen software to produce Haskell source code from the ASTs instead?
The obvious answer is "yes" -- unsurprising given that S-expressions were intended as a simple & uniform representation of parsed code. The thing is that languages like Haskell or ML tend to have some problems with that. I once did something similar to OCaml (abused CamlP4 and wrote some function that translates the P4 AST into some sexpr-like representation), and the fun begins when you run into similar kinds of AST nodes that have different types because they're not really the same... For example, there's function application, and there's a similar form that is used in patterns, and yet another form that is used in type expressions.
My guess is that trying to do genetic programming this way is likely to suffer from too much junk programs that don't have any meaning. But that's unsurprising too for any statically typed language -- a dynamically typed language will allow more junk in. Comparing the two worlds WRT to genetic programming might be interesting for reasons beyond the AI...
The Liskell paper is at http://clemens.endorphin.org/ILC07-Liskell-draft.pdf and the liskell.org site seems to still be up in general.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm going to be teaching a lower-division course in discrete structures. I have selected the text book Discrete Structures, Logic, and Computability in part because it contains examples and concepts that are conducive to implementation with a functional programming language. (I also think it's a good textbook.)
I want an easy-to-understand FP language to illustrate DS concepts and that the students can use. Most students will have had only one or two semesters of programming in Java, at best. After looking at Scheme, Erlang, Haskell, Ocaml, and SML, I've settled on either Haskell or Standard ML. I'm leaning towards Haskell for the reasons outlined below, but I'd like the opinion of those who are active programmers in one or the other.
Both Haskell and SML have pattern matching which makes describing a recursive algorithm a cinch.
Haskell has nice list comprehensions that match nicely with the way such lists are expressed mathematically.
Haskell has lazy evaluation. Great for constructing infinite lists using the list comprehension technique.
SML has a truly interactive interpreter in which functions can be both defined and used. In Haskell, functions must be defined in a separate file and compiled before being used in the interactive shell.
SML gives explicit confirmation of the function argument and return types in a syntax that's easy to understand. For example: val foo = fn : int * int -> int. Haskell's implicit curry syntax is a bit more obtuse, but not totally alien. For example: foo :: Int -> Int -> Int.
Haskell uses arbitrary-precision integers by default. It's an external library in SML/NJ. And SML/NJ truncates output to 70 characters by default.
Haskell's lambda syntax is subtle -- it uses a single backslash. SML is more explicit. Not sure if we'll ever need lambda in this class, though.
Essentially, SML and Haskell are roughly equivalent. I lean toward Haskell because I'm loving the list comprehensions and infinite lists in Haskell. But I'm worried that the extensive number of symbols in Haskell's compact syntax might cause students problems. From what I've gathered reading other posts on SO, Haskell is not recommended for beginners starting out with FP. But we're not going to be building full-fledged applications, just trying out simple algorithms.
What do you think?
Edit: Upon reading some of your great responses, I should clarify some of my bullet points.
In SML, there's no syntactic distinction between defining a function in the interpreter and defining it in an external file. Let's say you want to write the factorial function. In Haskell you can put this definition into a file and load it into GHCi:
fac 0 = 1
fac n = n * fac (n-1)
To me, that's clear, succinct, and matches the mathematical definition in the book. But if you want to write the function in GHCi directly, you have to use a different syntax:
let fac 0 = 1; fac n = n * fac (n-1)
When working with interactive interpreters, from a teaching perspective it's very, very handy when the student can use the same code in both a file and the command line.
By "explicit confirmation of the function," I meant that upon defining the function, SML right away tells you the name of the function, the types of the arguments, and the return type. In Haskell you have to use the :type command and then you get the somewhat confusing curry notation.
One more cool thing about Haskell -- this is a valid function definition:
fac 0 = 1
fac (n+1) = (n+1) * fac n
Again, this matches a definition they might find in the textbook. Can't do that in SML!
Much as I love Haskell, here are the reasons I would prefer SML for a class in discrete math and data structures (and most other beginners' classes):
Time and space costs of Haskell programs can be very hard to predict, even for experts. SML offers much more limited ways to blow the machine.
Syntax for function defintion in an interactive interpreter is identical to syntax used in a file, so you can cut and paste.
Although operator overloading in SML is totally bogus, it is also simple. It's going to be hard to teach a whole class in Haskell without having to get into type classes.
Student can debug using print. (Although, as a commenter points out, it is possible to get almost the same effect in Haskell using Debug.Trace.trace.)
Infinite data structures blow people's minds. For beginners, you're better off having them define a stream type complete with ref cells and thunks, so they know how it works:
datatype 'a thunk_contents = UNEVALUATED of unit -> 'a
| VALUE of 'a
type 'a thunk = 'a thunk_contents ref
val delay : (unit -> 'a) -> 'a thunk
val force : 'a thunk -> 'a
Now it's not magic any more, and you can go from here to streams (infinite lists).
Layout is not as simple as in Python and can be confusing.
There are two places Haskell has an edge:
In core Haskell you can write a function's type signature just before its definition. This is hugely helpful for students and other beginners. There just isn't a nice way to deal with type signatures in SML.
Haskell has better concrete syntax. The Haskell syntax is a major improvement over ML syntax. I have written a short note about when to use parentheses in an ML program; this helps a little.
Finally, there is a sword that cuts both ways:
Haskell code is pure by default, so your students are unlikely to stumble over impure constructs (IO monad, state monad) by accident. But by the same token, they can't print, and if you want to do I/O then at minumum you have to explain do notation, and return is confusing.
On a related topic, here is some advice for your course preparation: don't overlook Purely Functional Data Structures by Chris Okasaki. Even if you don't have your students use it, you will definitely want to have a copy.
We teach Haskell to first years at our university. My feelings about this are a bit mixed. On the one hand teaching Haskell to first years means they don't have to unlearn the imperative style. Haskell can also produce very concise code which people who had some Java before can appreciate.
Some problems I've noticed students often have:
Pattern matching can be a bit difficult, at first. Students initially had some problems seeing how value construction and pattern matching are related. They also had some problems distinguishing between abstractions. Our exercises included writing functions that simplify arithmetic expression and some students had difficulty seeing the difference between the abstract representation (e.g., Const 1) and the meta-language representation (1).
Furthermore, if your students are supposed to write list processing functions themselves, be careful pointing out the difference between the patterns
[]
[x]
(x:xs)
[x:xs]
Depending on how much functional programming you want to teach them on the way, you may just give them a few library functions and let them play around with that.
We didn't teach our students about anonymous functions, we simply told them about where clauses. For some tasks this was a bit verbose, but worked well otherwise. We also didn't tell them about partial applications; this is probably quite easy to explain in Haskell (due to its form of writing types) so it might be worth showing to them.
They quickly discovered list comprehensions and preferred them over higher-order functions like filter, map, zipWith.
I think we missed out a bit on teaching them how to let them guide their thoughts by the types. I'm not quite sure, though, whether this is helpful to beginners or not.
Error messages are usually not very helpful to beginners, they might occasionally need some help with these. I haven't tried it myself, but there's a Haskell compiler specifically targeted at newcomers, mainly by means of better error messages: Helium
For the small programs, things like possible space leaks weren't an issue.
Overall, Haskell is a good teaching language, but there are a few pitfalls. Given that students feel a lot more comfortable with list comprehensions than higher-order functions, this might be the argument you need. I don't know how long your course is or how much programming you want to teach them, but do plan some time for teaching them basic concepts--they will need it.
BTW,
# SML has a truly interactive
interpreter in which functions can be
both defined and used. In Haskell,
functions must be defined in a
separate file and compiled before
being used in the interactive shell.
Is inaccurate. Use GHCi:
Prelude> let f x = x ^ 2
Prelude> f 7
49
Prelude> f 2
4
There are also good resources for Haskell in education on the haskell.org edu. page, with experiences from different teachers. http://haskell.org/haskellwiki/Haskell_in_education
Finally, you'll be able to teach them multicore parallelism just for fun, if you use Haskell :-)
Many universities teach Haskell as a first functional language or even a first programming language, so I don't think this will be a problem.
Having done some of the teaching on one such course, I don't agree that the possible confusions you identify are that likely. The most likely sources of early confusion are parsing errors caused by bad layout, and mysterious messages about type classes when numeric literals are used incorrectly.
I'd also disagree with any suggestion that Haskell is not recommended for beginners starting out with FP. It's certainly the big bang approach in ways that strict languages with mutation aren't, but I think that's a very valid approach.
SML has a truly interactive interpreter in which functions can be both defined and used. In Haskell, functions must be defined in a separate file and compiled before being used in the interactive shell.
While Hugs may have that limitation, GHCi does not:
$ ghci
GHCi, version 6.10.1: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.
Prelude> let hello name = "Hello, " ++ name
Prelude> hello "Barry"
"Hello, Barry"
There's many reasons I prefer GHC(i) over Hugs, this is just one of them.
SML gives explicit confirmation of the function argument and return types in a syntax that's easy to understand. For example: val foo = fn : int * int -> int. Haskell's implicit curry syntax is a bit more obtuse, but not totally alien. For example: foo :: Int -> Int -> Int.
SML has what you call "implicit curry" syntax as well.
$ sml
Standard ML of New Jersey v110.69 [built: Fri Mar 13 16:02:47 2009]
- fun add x y = x + y;
val add = fn : int -> int -> int
Essentially, SML and Haskell are roughly equivalent. I lean toward Haskell because I'm loving the list comprehensions and infinite lists in Haskell. But I'm worried that the extensive number of symbols in Haskell's compact syntax might cause students problems. From what I've gathered reading other posts on SO, Haskell is not recommended for beginners starting out with FP. But we're not going to be building full-fledged applications, just trying out simple algorithms.
I like using Haskell much more than SML, but I would still teach SML first.
Seconding nominolo's thoughts, list comprehensions do seem to slow students from getting to some higher-order functions.
If you want laziness and infinite lists, it's instructive to implement it explicitly.
Because SML is eagerly evaluated, the execution model is far easier to comprehend, and "debugging via printf" works a lot better than in Haskell.
SML's type system is also simpler. While your class likely wouldn't use them anyways, Haskell's typeclasses are still an extra bump to get over -- getting them to understand the 'a versus ''a distinction in SML is tough enough.
Most answers were technical, but I think you should consider at least one that is not: Haskell (as OCaml), at this time, has a bigger community using it in a wider range of contexts. There's also a big database of libraries and applications written for profit and fun at Hackage. That may be an important factor in keeping some of your students using the language after your course is finished, and maybe trying other functional languages (like Standard ML) later.
I am amazed you are not considering OCaml and F# given that they address so many of your concerns. Surely decent and helpful development environments are a high priority for learners? SML is way behind and F# is way ahead of all other FPLs in that respect.
Also, both OCaml and F# have list comprehensions.
Haskell. I'm ahead in my algos/theory class in CS because of the stuff I learned from using Haskell. It's such a comprehensive language, and it will teach you a ton of CS, just by using it.
However, SML is much easier to learn. Haskell has features such as lazy evaluation and control structures that make it much more powerful, but with the cost of a steep(ish) learning curve. SML has no such curve.
That said, most of Haskell was unlearning stuff from less scientific/mathematic languages such as Ruby, ObjC, or Python.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I know a few programmers who keep talking about Haskell when they are among themselves, and here on SO everyone seems to love that language. Being good at Haskell seems somewhat like the hallmark of a genius programmer.
Can someone give a few Haskell examples that show why it is so elegant / superior?
This is the example that convinced me to learn Haskell (and boy am I glad I did).
-- program to copy a file --
import System.Environment
main = do
--read command-line arguments
[file1, file2] <- getArgs
--copy file contents
str <- readFile file1
writeFile file2 str
OK, it's a short, readable program. In that sense it's better than a C program. But how is this so different from (say) a Python program with a very similar structure?
The answer is lazy evaluation. In most languages (even some functional ones), a program structured like the one above would result in the entire file being loaded into memory, and then written out again under a new name.
Haskell is "lazy". It doesn't calculate things until it needs to, and by extension doesn't calculate things it never needs. For instance, if you were to remove the writeFile line, Haskell wouldn't bother reading anything from the file in the first place.
As it is, Haskell realises that the writeFile depends on the readFile, and so is able to optimise this data path.
While the results are compiler-dependent, what will typically happen when you run the above program is this: the program reads a block (say 8KB) of the first file, then writes it to the second file, then reads another block from the first file, and writes it to the second file, and so on. (Try running strace on it!)
... which looks a lot like what the efficient C implementation of a file copy would do.
So, Haskell lets you write compact, readable programs - often without sacrificing a lot of performance.
Another thing I must add is that Haskell simply makes it difficult to write buggy programs. The amazing type system, lack of side-effects, and of course the compactness of Haskell code reduces bugs for at least three reasons:
Better program design. Reduced complexity leads to fewer logic errors.
Compact code. Fewer lines for bugs to exist on.
Compile errors. Lots of bugs just aren't valid Haskell.
Haskell isn't for everyone. But everyone should give it a try.
The way it was pitched to me, and what I think is true after having worked on learning on Haskell for a month now, is the fact that functional programming twists your brain in interesting ways: it forces you to think about familiar problems in different ways: instead of loops, think in maps and folds and filters, etc. In general, if you have more than one perspective on a problem, it makes you better enabled to reason about this problem, and switch viewpoints as necessary.
The other really neat thing about Haskell is its type system. It's strictly typed, but the type inference engine makes it feel like a Python program that magically tells you when you've done a stupid type-related mistake. Haskell's error messages in this regard are somewhat lacking, but as you get more acquainted with the language you'll say to yourself: this is what typing is supposed to be!
You are kind of asking the wrong question.
Haskell is not a language where you go look at a few cool examples and go "aha, I see now, that's what makes it good!"
It's more like, we have all these other programming languages, and they're all more or less similar, and then there's Haskell which is totally different and wacky in a way that's totally awesome once you get used to the wackiness. But the problem is, it takes quite a while to acclimate to the wackiness. Things that set Haskell apart from almost any other even-semi-mainstream language:
Lazy evaluation
No side effects (everything is pure, IO/etc happens via monads)
Incredibly expressive static type system
as well as some other aspects that are different from many mainstream languages (but shared by some):
functional
significant whitespace
type inferred
As some other posters have answered, the combination of all these features means that you think about programming in an entirely different way. And so it's hard to come up with an example (or set of examples) that adequately communicates this to Joe-mainstream-programmer. It's an experiential thing. (To make an analogy, I can show you photos of my 1970 trip to China, but after seeing the photos, you still won't know what it was like to have lived there during that time. Similarly, I can show you a Haskell 'quicksort', but you still won't know what it means to be a Haskeller.)
What really sets Haskell apart is the effort it goes to in its design to enforce functional programming. You can program in a functional style in pretty much any language, but it's all too easy to abandon at the first convenience. Haskell does not allow you to abandon functional programming, so you must take it to its logical conclusion, which is a final program that is easier to reason about, and sidesteps a whole class of the thorniest types of bugs.
When it comes to writing a program for real world use, you may find Haskell lacking in some practical fashion, but your final solution will be better for having known Haskell to begin with. I'm definitely not there yet, but so far learning Haskell has been much more enlightening than say, Lisp was in college.
Part of the fuss is that purity and static typing enable for parallelism combined with aggressive optimisations. Parallel languages are hot now with multicore being a bit disruptive.
Haskell gives you more options for parallelism than pretty much any general purpose language, along with a fast, native code compiler. There is really no competition with this kind of support for parallel styles:
semi-implicit parallelism via thread sparks
explicit threads
data parallel arrays
actors and message passing
transactional memory
So if you care about making your multicore work, Haskell has something to say.
A great place to start is with Simon Peyton Jones' tutorial on parallel and concurrent programming in Haskell.
I've spent the last year learning Haskell and writing a reasonably large and complex project in it. (The project is an automated options trading system, and everything from the trading algorithms to the parsing and handling of low-level, high-speed market data feeds is done in Haskell.) It's considerably more concise and easier to understand (for those with appropriate background) than a Java version would be, as well as extremely robust.
Possibly the biggest win for me has been the ability to modularize control flow through things such as monoids, monads, and so on. A very simple example would be the Ordering monoid; in an expression such as
c1 `mappend` c2 `mappend` c3
where c1 and so on return LT, EQ or GT, c1 returning EQ causes the expression to continue, evaluating c2; if c2 returns LT or GT that's the value of the whole, and c3 is not evaluated. This sort of thing gets considerably more sophisticated and complex in things like monadic message generators and parsers where I may be carrying around different types of state, have varying abort conditions, or may want to be able to decide for any particular call whether abort really means "no further processing" or means, "return an error at the end, but carry on processing to collect further error messages."
This is all stuff it takes some time and probably quite some effort to learn, and thus it can be hard to make a convincing argument for it for those who don't already know these techniques. I think that the All About Monads tutorial gives a pretty impressive demonstration of one facet of this, but I wouldn't expect that anybody not familiar with the material already would "get it" on the first, or even the third, careful reading.
Anyway, there's lots of other good stuff in Haskell as well, but this is a major one that I don't see mentioned so often, probably because it's rather complex.
Software Transactional Memory is a pretty cool way to deal with concurrency. It's much more flexible than message passing, and not deadlock prone like mutexes. GHC's implementation of STM is considered one of the best.
For an interesting example you can look at:
http://en.literateprograms.org/Quicksort_(Haskell)
What is interesting is to look at the implementation in various languages.
What makes Haskell so interesting, along with other functional languages, is the fact that you have to think differently about how to program. For example, you will generally not use for or while loops, but will use recursion.
As is mentioned above, Haskell and other functional languages excel with parallel processing and writing applications to work on multi-cores.
I couldn't give you an example, I'm an OCaml guy, but when I'm in such a situation as yourself, curiosity just takes hold and I have to download a compiler/interpreter and give it a go. You'll likely learn far more that way about the strengths and weaknesses of a given functional language.
One thing I find very cool when dealing with algorithms or mathematical problems is Haskell's inherent lazy evaluation of computations, which is only possible due to its strict functional nature.
For example, if you want to calculate all primes, you could use
primes = sieve [2..]
where sieve (p:xs) = p : sieve [x | x<-xs, x `mod` p /= 0]
and the result is actually an infinite list. But Haskell will evaluate it left from right, so as long as you don't try to do something that requires the entire list, you can can still use it without the program getting stuck in infinity, such as:
foo = sum $ takeWhile (<100) primes
which sums all primes less than 100. This is nice for several reasons. First of all, I only need to write one prime function that generates all primes and then I'm pretty much ready to work with primes. In an object-oriented programming language, I would need some way to tell the function how many primes it should compute before returning, or emulate the infinite list behavior with an object. Another thing is that in general, you end up writing code that expresses what you want to compute and not in which order to evaluate things - instead the compiler does that for you.
This is not only useful for infinite lists, in fact it gets used without you knowing it all the time when there is no need to evaluate more than necessary.
I find that for certain tasks I am incredibly productive with Haskell.
The reason is because of the succinct syntax and the ease of testing.
This is what the function declaration syntax is like:
foo a = a + 5
That's is simplest way I can think of defining a function.
If I write the inverse
inverseFoo a = a - 5
I can check that it is an inverse for any random input by writing
prop_IsInverse :: Double -> Bool
prop_IsInverse a = a == (inverseFoo $ foo a)
And calling from the command line
jonny#ubuntu: runhaskell quickCheck +names fooFileName.hs
Which will check that all the properties in my file are held, by randomly testing inputs a hundred times of so.
I don't think Haskell is the perfect language for everything, but when it comes to writing little functions and testing, I haven't seen anything better. If your programming has a mathematical component this is very important.
I agree with others that seeing a few small examples is not the best way to show off Haskell. But I'll give some anyway. Here's a lightning-fast solution to Euler Project problems 18 and 67, which ask you to find the maximum-sum path from the base to the apex of a triangle:
bottomUp :: (Ord a, Num a) => [[a]] -> a
bottomUp = head . bu
where bu [bottom] = bottom
bu (row : base) = merge row $ bu base
merge [] [_] = []
merge (x:xs) (y1:y2:ys) = x + max y1 y2 : merge xs (y2:ys)
Here is a complete, reusable implementation of the BubbleSearch algorithm by Lesh and Mitzenmacher. I used it to pack large media files for archival storage on DVD with no waste:
data BubbleResult i o = BubbleResult { bestResult :: o
, result :: o
, leftoverRandoms :: [Double]
}
bubbleSearch :: (Ord result) =>
([a] -> result) -> -- greedy search algorithm
Double -> -- probability
[a] -> -- list of items to be searched
[Double] -> -- list of random numbers
[BubbleResult a result] -- monotone list of results
bubbleSearch search p startOrder rs = bubble startOrder rs
where bubble order rs = BubbleResult answer answer rs : walk tries
where answer = search order
tries = perturbations p order rs
walk ((order, rs) : rest) =
if result > answer then bubble order rs
else BubbleResult answer result rs : walk rest
where result = search order
perturbations :: Double -> [a] -> [Double] -> [([a], [Double])]
perturbations p xs rs = xr' : perturbations p xs (snd xr')
where xr' = perturb xs rs
perturb :: [a] -> [Double] -> ([a], [Double])
perturb xs rs = shift_all p [] xs rs
shift_all p new' [] rs = (reverse new', rs)
shift_all p new' old rs = shift_one new' old rs (shift_all p)
where shift_one :: [a] -> [a] -> [Double] -> ([a]->[a]->[Double]->b) -> b
shift_one new' xs rs k = shift new' [] xs rs
where shift new' prev' [x] rs = k (x:new') (reverse prev') rs
shift new' prev' (x:xs) (r:rs)
| r <= p = k (x:new') (prev' `revApp` xs) rs
| otherwise = shift new' (x:prev') xs rs
revApp xs ys = foldl (flip (:)) ys xs
I'm sure this code looks like random gibberish. But if you read Mitzenmacher's blog entry and understand the algorithm, you'll be amazed that it's possible to package the algorithm into code without saying anything about what you're searching for.
Having given you some examples as you asked for, I will say that the best way to start to appreciate Haskell is to read the paper that gave me the ideas I needed to write the DVD packer: Why Functional Programming Matters by John Hughes. The paper actually predates Haskell, but it brilliantly explains some of the ideas that make people like Haskell.
For me, the attraction of Haskell is the promise of compiler guaranteed correctness. Even if it is for pure parts of the code.
I have written a lot of scientific simulation code, and have wondered so many times if there was a bug in my prior codes, which could invalidate a lot of current work.
it has no loop constructs. not many languages have this trait.
If you can wrap your head around the type system in Haskell I think that in itself is quite an accomplishment.
I agree with those that said that functional programming twists your brain into seeing programming from a different angle. I've only used it as a hobbyist, but I think it fundamentally changed the way I approach a problem. I don't think I would have been nearly as effective with LINQ without having been exposed to Haskell (and using generators and list comprehensions in Python).
To air a contrarian view: Steve Yegge writes that Hindely-Milner languages lack the flexibility required to write good systems:
H-M is very pretty, in a totally
useless formal mathematical sense. It
handles a few computation constructs
very nicely; the pattern matching
dispatch found in Haskell, SML and
OCaml is particularly handy.
Unsurprisingly, it handles some other
common and highly desirable constructs
awkwardly at best, but they explain
those scenarios away by saying that
you're mistaken, you don't actually
want them. You know, things like, oh,
setting variables.
Haskell is worth learning, but it has its own weaknesses.