How does Haskell compile zipper patterns? - haskell

"Learn You a Haskell" shows the following data type and then gives a bunch of algorithms that manipulate the trees using this.
data Crumb a = LeftCrumb a (Tree a) | Right Crumb a (Tree a) deriving (Show)
Unlike imperative languages where something like binary search would be explained in terms of walking down pointers. Here there are no mentions of pointers. But how do algorithms like binary search get compiled down in Haskll? Do they compile down to the same efficient walking down pointers?

The Haskell language: Compilers can do whatever they want to the code as long as it makes sense from the specification. This means that there can be pointer walking just like you'd expect in C, or there might not be. The language specification doesn't really care how the things are implemented, as long as they work like they are supposed to.
The GHC compiler: If you really want to know how GHC compiles your code in the end, I suggest learning to read C-- (pronounced "C-minus-minus") or assembly. You can get GHC to spit out C-- code with -ddump-cmm and assembly with -ddump-asm. Unless you are planning to start work on optimising the compiler though, I don't think this would be a very useful exercise.
As a general rule, imperative code GHC writes looks very different from what a human would write. So probably no pointers in the sense you're thinking of. (And the cool thing is that it works out efficiently in the end anyway!)

Related

sizeof, offsetof, and alignment via TemplateHaskell

I wonder if someone has implemented analogues of hsc2hs pragmas via TemplateHaskell? It feels like it should be doable, since TH runs on target platform at compile time, and GHC always has a C compiler lying around. This could be useful as another way of generating haskell wrappers for C structs and deriving stuff for them.
The question is: if there is such a library, point me please. Otherwise, tell me if I miss something and this is impossible or does not make sense.

Are there benefits of strong typing besides safety?

In the Haskell community, we are slowly adding features of dependent types. Dependent types is an advanced typing feature by which types can depend on values. Some languages like Agda and Idris already have them. It appears to be a very advanced feature requiring an advanced type system, until you realize that python has had dependent types has had the dynamic typing version of dependent types, which may or may not be actual dependent types, from the beginning.
For most any program in a functional programming language, there is a way to reperesent it as an untyped lambda calculus term, no matter how advanced the typing. That's because typing only eliminates programs, not enable new ones.
Strong Typing wins us safety. How classes of errors that happened at run time can no longer happen at run time. This safety is rather nice. Besides this safety though, what does strong typing give you?
Are there an additional benefits of a strong type system besides safety?
(Note that I'm not saying that strong typing is worthless. Safety is a huge benefit in and of itself. I'm just wondering if there are additional benefits.)
First, we need to talk a bit about the history of the simply typed lambda calculus.
There are two historical developments of the simply typed lambda calculus.
When Alonzo Church described the lambda calculus the types were baked in as part of the meaning / operational behavior of the terms.
When Haskell Curry described the lambda calculus the types were annotations put on the terms.
So we have the lambda calculus a la Church and the lambda calculus a la Curry. See https://en.wikipedia.org/wiki/Simply_typed_lambda_calculus#Intrinsic_vs._extrinsic_interpretations for more.
Ironically, the language Haskell, which is named after Curry is based on a lambda calculus a la Church!
What this means is the types aren't simply annotations that rule out bad programs for you. They can "do stuff" too. Such types don't erase without leaving residue.
This shows up in Haskell's notion of type classes, which are really why Haskell is a language a la Church.
In Haskell, when I make a function
sort :: Ord a => [a] -> [a]
We're passing an object or dictionary for Ord a as the first argument.
But you aren't forced to plumb that argument around yourself in the code, it is the job of the compiler to build that up and use it.
instance Ord Char
instance Ord Int
instance Ord a => Ord [a]
So if you go and use sort on a list of strings, which are themselves lists of chars, then this will build up the dictionary by passing the Ord Char instance through the instance for Ord a => Ord [a] to get Ord [Char], which is the same as Ord String, then you can sort a list of strings.
Calling sort above, is a lot less verbose than manually building a LexicographicComparator<List<Char>> by passing it an IComparator<Char> to its constructor and calling the function with an extra second argument, if I were to compare the complexity of calling such a sort function in Haskell to calling it in C# or Java.
This shows us that programming with types can be significantly less verbose, because mechanisms like implicits and typeclasses can infer a large part of the code for your program during type checking.
On a simpler basis, even the sizes of arguments can depend on types, unless you want to pay fairly massive costs for boxing everything in your language up so that it has a homogeneous representation.
This shows us that programming with types can be significantly more efficient, because it can use dedicated representations, rather than paying for boxed structures everywhere in your code. An int can't just be a machine integer, because it has to somehow look like everything else in the system. If you're willing to give up an order of magnitude or more worth of performance at runtime, then this may not matter to you.
Finally, once we have types "doing stuff" for us, it is often beneficial to consider the refactoring benefits that mere safety provides.
If I refactor the smaller set of code that remains, it'll rewrite all that type-class plumbing for me. It'll figure out the new ways it can rewrite the code to unbox more arguments. I'm not stuck elaborating all of this stuff by hand, I can leave these mundane tasks to the type-checker.
But even when I do change the types, I can move arguments around fairly willy-nilly, comfortable that the compiler will very likely catch my errors. Types give you "free theorems" which are like unit tests for whole classes of such errors.
On the other hand, once I lock down an API in a language like Python I'm deathly afraid of changing it, because it'll silently break at runtime for all my downstream dependencies! This leads to baroque APIs that lean heavily on easily bit-rotted keyword-arguments, and the API of something that evolves over time rarely resembles what you'd build out of the box if you had it to do over again. Consequently, even the mere safety concern has long-term impact in API design once you ever want people to build on top of your work, rather than simply replace it when it gets too unwieldy.
That's because typing only eliminates programs, not enable new ones.
This is not a correct statement. Type-classes make it possible to generate parts of your program from type-level information.
Consider two expressions:
readMaybe "15" :: Maybe Integer
readMaybe "15" :: Maybe Bool
Here I'm using the readMaybe function from the Text.Read module. At term level those expressions are identical, only their type annotations are different. However, the results they produce at runtime differ (Just 15 in the first case, Nothing in the second case).
This is because the compiler generates code for you from the static type information you have. To be more precise, it selects a suitable type class instance and passes its dictionary to the polymorphic function (readMaybe in our case).
This example is simple, but there are way more complex use cases. Using the mtl library you can write computations that run in different computational contexts (aka Monads). The compiler will automatically insert a lot of code that manages the computational contexts. In a dynamically typed language, you would have no static information to make this possible.
As you can see, static typing not only cuts off incorrect programs but also writes correct ones for you.
You need "safety" when you already know what and how you want to write. It's a very small part of what types are useful for. The most important thing about types is that they make your reasoning structured. When someone writes in Python a + b he doesn't see a and b as some abstract variables — he sees them as some numbers. Types are already there in the internal language of humans, Python just doesn't have a type system to talk about them. The actual question in the "typed vs untyped (unityped) programming" dispute is "do we want to reflect our internal structured concepts in a safe and explicit or unsafe and implicit way?". Types don't introduce new concepts — it's untyped reasoning forgets the existing ones.
When someone looks at a tree (I mean a real green one) he doesn't see every single leaf on it, but he doesn't treat it as an abstract nameless object as well. "A tree" — is an approximation that is good enough for most cases and that's why we have Hindley-Milner type systems, but sometimes you want to talk about a specific tree and you do want to look at leaves. And that's what dependent types give you: the ability to zoom. "A tree without leaves", "a tree in the forest", "a tree of a particular form"... Dependently typed programming is just another step towards how humans think.
On a less abstract note, I have a type checker for a toy dependently typed language, where all typing rules are expressed as constructors of a data type. You don't need to dive into the type checking procedure to understand the rules of the system. That's the power of "zooming": you can introduce as complex invariants as you want, thus distinguishing essential parts from not important ones.
Another example of the power dependent types give you is various forms of reflection. Look e.g. at the Pierre-Évariste Dagand thesis, which proves that
generic programming is just programming
And of course types are hints, many functions and abstractions I defined I would define in a far more clumsy way in a weakly typed language, but types suggested better alternatives.
There is just no question "What to choose: simple types or dependent types?". Dependent types are always better and they of course subsume simple types. The question is "What to choose: no types or dependent types?", but that question doesn't stand for me.
Refactoring. By having a strong type system you can safely refactor code and have the compiler tell you whether what you are doing now even makes sense. The stronger the typing system, the more refactor errors are avoided. This of course means your code is a lot more maintainable.

How to build Abstract Syntax Trees from grammar specification in Haskell?

I'm working on a project which involves optimizing certain constructs in a very small subset of Java, formalized in BNF.
If I were to do this in Java, I would use a combination of JTB and JavaCC which builds an AST. Visitors are then used to manipulate the tree. But, given the vast libraries for parsing in Haskell (parsec, happy, alex etc), I'm a bit confused in chossing the appropriate library.
So, simply put, when a language is specified in BNF, which library offers the easiest means to build an AST? And what is the best way to go about modifying this tree in idiomatic Haskell?
Well in Haskell there are 2 main ways of parsing something, parse combinators or a parser generator. Since you already have a BNF I'd suggest the latter.
A good one is alex. GHC's parser IIRC is written using this so you'd be in good company.
Next you'll have a big honking stack of data declarations to parse into:
data JavaClass = {
className :: Name,
interfaces :: [Name],
contents :: [ClassContents],
...
}
data ClassContents = M Method
| F Field
| IC InnerClass
and for expressions and whatever else you need. Finally you'll combine these into something like
data TopLevel = JC JavaClass
| WhateverOtherForms
| YouWillParse
Once you have this you'll have the entire AST represented as one TopLevel or a list of them depending on how many you classes/files you parse.
To proceed from here depends on what you want to do. There are a number of libraries such as syb (scrap your boilerplate) that let you write very concise tree traversals and modifications. lens is also an option. At a minimum check out Data.Traversable and Data.Foldable.
To modify the tree, you can do something as simple as
ignoreInnerClasses :: JavaClass -> JavaClass
ignoreInnerContents c = c{contents = filter isClass $ contents c}
-- ^^^ that is called a record update
where isClass (IC _) = True
isClass _ = False
and then you could potentially use something like syb to write
everywhere (mkT ignoreInnerClass) toplevel
which will traverse everything and apply ignoreInnerClass to all JavaClasses. This is possible to do in lens and many other libraries too, but syb is very easy to read.
I've never used bnfc-meta (suggested by #phg), but I would strongly recommend you look into BNFC (on hackage: http://hackage.haskell.org/package/BNFC). The basic approach is that you write your grammar in an annotated BNF style, and it will automatically generate an AST, parser, and pretty-printer for the grammar.
How suitable BNFC is depends upon the complexity of your grammar. If it's not context-free, you'll likely have a difficult time making any progress (I did make some success hacking up context-sensitive extensions, but that code's likely bit-rotted by now). The other downside is that your AST will very directly reflect the grammar specification. But since you already have a BNF specification, adding the necessary annotations for BNFC should be rather straightforward, so it's probably the fastest way to get a usable AST. Even if you decide to go a different route, you might be able to take the generated data types as a starting point for a hand-written version.
Alex + Happy.
There are many approaches to modify/investigate the parsed terms (ASTs). The keyword to search for is "datatype-generic" programming. But beware: it is a complex topic ...
http://people.cs.uu.nl/andres/Rec/MutualRec.pdf
http://www.cs.uu.nl/wiki/GenericProgramming/Multirec
It has a generic implementation of the zipper available here:
http://hackage.haskell.org/packages/archive/zipper/0.3/doc/html/Generics-MultiRec-Zipper.html
Also checkout https://github.com/pascalh/Astview
You might also check out the Haskell Compiler Series which is nice as an introduction to using alex and happy to parse a subset of Java: http://bjbell.wordpress.com/haskell-compiler-series/.
Since your grammar can be expressed in BNF, it is in the class of grammars that are efficiently parseable with a shift-reduce parser (LALR grammars). Such efficient parsers can be generated by the parser generator yacc/bison (C,C++), or its Haskell equivalent "Happy".
That's why I would use "Happy" in your case. It takes grammar rules in BNF form and generates a parser from it directly. The resulting parser will accept the language that is described by your grammar rules and produce an AST (abstract syntax tree). The Happy user guide is quite nice and gets you started quickly:
http://www.haskell.org/happy/doc/html/
To transform the resulting AST, generic programming is a good idea. Here is a classical explanation on how to do this in Haskell in a practical fashion, from scratch:
http://research.microsoft.com/en-us/um/people/simonpj/papers/hmap/
I have used exactly this to build a compiler for a small domain specific language, and it was a simple and concise solution.

What's the next step to learning Haskell after monads?

I've been gradually learning Haskell, and even feel like I've got a hang of monads. However, there's still a lot of more exotic stuff that I barely understand, like Arrows, Applicative, etc. Although I'm picking up bits and pieces from Haskell code I've seen, it would be good to find a tutorial that really explains them wholly. (There seem to be dozens of tutorials on monads.. but everything seems to finish straight after that!)
Here are a few of the resources that I've found useful after "getting the hang of" monads:
As SuperBloup noted, Brent Yorgey's Typeclassopedia is indispensable (and it does in fact cover arrows).
There's a ton of great stuff in Real World Haskell that could be considered "after monads": applicative parsing, monad transformers, and STM, for example.
John Hughes's "Generalizing Monads to Arrows" is a great resource that taught me as much about monads as it did about arrows (even though I thought that I already understood monads when I read it).
The "Yampa Arcade" paper is a good introduction to Functional Reactive Programming.
On type families: I've found working with them easier than reading about them. The vector-space package is one place to start, or you could look at the code from Oleg Kiselyov and Ken Shan's course on Haskell and natural language semantics.
Pick a couple of chapters of Chris Okasaki's Purely Functional Data Structures and work through them in detail.
Raymond Smullyan's To Mock a Mockingbird is a fantastically accessible introduction to combinatory logic that will change the way you write Haskell.
Read Gérard Huet's Functional Pearl on zippers. The code is OCaml, but it's useful (and not too difficult) to be able to translate OCaml to Haskell in your head when working through papers like this.
Most importantly, dig into the code of any Hackage libraries you find yourself using. If they're doing something with syntax or idioms or extensions that you don't understand, look it up.
Regarding type classes:
Applicative is actually simpler than Monad. I've recently said a few things about it elsewhere, but the gist is that it's about enhanced Functors that you can lift functions into. To get a feel for Applicative, you could try writing something using Parsec without using do notation--my experience has been that applicative style works better than monadic for straightforward parsers.
Arrows are a very abstract way of working with things that are sort of like functions ("arrows" between types). They can be difficult to get your mind around until you stumble on something that's naturally Arrow-like. At one point I reinvented half of Control.Arrow (poorly) while writing interactive state machines with feedback loops.
You didn't mention it, but an oft-underrated, powerful type class is the humble Monoid. There are lots of places where monoid-like structure can be found. Take a look at the monoids package, for instance.
Aside from type classes, I'd offer a very simple answer to your question: Write programs! The best way to learn is by doing, so pick something fun or useful and just make it happen.
In fact, many of the more abstract concepts--like Arrow--will probably make more sense if you come back to them later and find that, like me, they offer a tidy solution to a problem you've encountered but hadn't even realized could be abstracted out.
However, if you want something specific to shoot for, why not take a look at Functional Reactive Programming--this is a family of techniques that have a lot of promise, but there are a lot of open questions of what the best way to do it is.
Typeclasses like Monad, Applicative, Arrow, Functor are great and all, and even more great for changing how you think about code than necessarily the convenience of having functions generic over them. But there's a common misconception that the "next step" in Haskell is learning about more typeclasses and ways of structuring control flow. The next step is in deciding what you want to write, and trying to write it, exploring what you need along the way.
And even if you understand Monads, that doesn't mean you've scratched the surface of what you can do with monadically structured code. Play with parser combinator libraries, or write your own. Explore why applicative notation is sometimes easier for them. Explore why limiting yourself to applicative parsers might be more efficient.
Look at logic or math problems and explore ways of implementing backtracking -- depth-first, breadth-first, etc. Explore the difference between ListT and LogicT and ChoiceT. Take a look at continuations.
Or do something completely different!
Far and away the most important thing you can do is explore more of Hackage. Grappling with the various exotic features of Haskell will perhaps let you find improved solutions to certain problems, while the libraries on Hackage will vastly expand your set of tools.
The best part about the Haskell ecosystem is that you get to balance learning surgically precise new abstraction techniques with learning how to use the giant buzz saws available to you on Hackage.
Start writing code. You'll learn necessary concepts as you go.
Beyond the language, to use Haskell effectively, you need to learn some real-world tools and techniques. Things to consider:
Cabal, a tool to manage dependencies, build and deploy Haskell applications*.
FFI (Foreign Function Interface) to use C libraries from your Haskell code**.
Hackage as a source of others' libraries.
How to profile and optimize.
Automatic testing frameworks (QuickCheck, HUnit).
*) cabal-init helps to quick-start.
**) Currently, my favourite tool for FFI bindings is bindings-DSL.
As a single next step (rather than half a dozen "next steps"), I suggest that you learn to write your own type classes. Here are a couple of simple problems to get you started:
Writing some interesting instance declarations for QuickCheck. Say for example that you want to generate random trees that are in some way "interesting".
Move on to the following little problem: define functions /\, \/, and complement ("and", "or", & "not") that can be applied not just to Booleans but to predicates of arbitrary arity. (If you look carefully, you can find the answer to this one on SO.)
You know all you need to go forth and write code. But if you're looking for more Haskell-y things to learn about, may I suggest:
Type families. Very handy feature. It basically gives you a way to write functions on the level of types, which is handy when you're trying to write a function whose parameters are polymorphic in a very precise way. One such example:
data TTrue = TTrue
data FFalse = FFalse
class TypeLevelIf tf a b where
type If tf a b
weirdIfStatement :: tf -> a -> b -> tf a b
instance TypeLevelIf TTrue a b where
type If TTrue a b = a
weirdIfStatement TTrue a b = a
instance TypeLevelIf FFalse a b where
type If FFalse a b = b
weirdIfStatement FFalse a b = a
This gives you a function that behaves like an if statement, but is able to return different types based on the truth value it is given.
If you're curious about type-level programming, type families provide one avenue into this topic.
Template Haskell. This is a huge subject. It gives you a power similar to macros in C, but with much more type safety.
Learn about some of the leading Haskell libraries. I can't count how many times parsec has enabled me to write an insanely useful utility quickly. dons periodically publishes a list of popular libraries on hackage; check it out.
Contribute to GHC!
Write a haskell compiler :-).

Write a Haskell interpreter in Haskell

A classic programming exercise is to write a Lisp/Scheme interpreter in Lisp/Scheme. The power of the full language can be leveraged to produce an interpreter for a subset of the language.
Is there a similar exercise for Haskell? I'd like to implement a subset of Haskell using Haskell as the engine. Of course it can be done, but are there any online resources available to look at?
Here's the backstory.
I am exploring the idea of using Haskell as a language to explore some of the concepts in a Discrete Structures course I am teaching. For this semester I have settled on Miranda, a smaller language that inspired Haskell. Miranda does about 90% of what I'd like it to do, but Haskell does about 2000%. :)
So my idea is to create a language that has exactly the features of Haskell that I'd like and disallows everything else. As the students progress, I can selectively "turn on" various features once they've mastered the basics.
Pedagogical "language levels" have been used successfully to teach Java and Scheme. By limiting what they can do, you can prevent them from shooting themselves in the foot while they are still mastering the syntax and concepts you are trying to teach. And you can offer better error messages.
I love your goal, but it's a big job. A couple of hints:
I've worked on GHC, and you don't want any part of the sources. Hugs is a much simpler, cleaner implementation but unfortunately it's in C.
It's a small piece of the puzzle, but Mark Jones wrote a beautiful paper called Typing Haskell in Haskell which would be a great starting point for your front end.
Good luck! Identifying language levels for Haskell, with supporting evidence from the classroom, would be of great benefit to the community and definitely a publishable result!
There is a complete Haskell parser: http://hackage.haskell.org/package/haskell-src-exts
Once you've parsed it, stripping out or disallowing certain things is easy. I did this for tryhaskell.org to disallow import statements, to support top-level definitions, etc.
Just parse the module:
parseModule :: String -> ParseResult Module
Then you have an AST for a module:
Module SrcLoc ModuleName [ModulePragma] (Maybe WarningText) (Maybe [ExportSpec]) [ImportDecl] [Decl]
The Decl type is extensive: http://hackage.haskell.org/packages/archive/haskell-src-exts/1.9.0/doc/html/Language-Haskell-Exts-Syntax.html#t%3ADecl
All you need to do is define a white-list -- of what declarations, imports, symbols, syntax is available, then walk the AST and throw a "parse error" on anything you don't want them to be aware of yet. You can use the SrcLoc value attached to every node in the AST:
data SrcLoc = SrcLoc
{ srcFilename :: String
, srcLine :: Int
, srcColumn :: Int
}
There's no need to re-implement Haskell. If you want to provide more friendly compile errors, just parse the code, filter it, send it to the compiler, and parse the compiler output. If it's a "couldn't match expected type a against inferred a -> b" then you know it's probably too few arguments to a function.
Unless you really really want to spend time implementing Haskell from scratch or messing with the internals of Hugs, or some dumb implementation, I think you should just filter what gets passed to GHC. That way, if your students want to take their code-base and take it to the next step and write some real fully fledged Haskell code, the transition is transparent.
Do you want to build your interpreter from scratch? Begin with implementing an easier functional language like the lambda calculus or a lisp variant. For the latter there is a quite nice wikibook called Write yourself a Scheme in 48 hours giving a cool and pragmatic introduction into parsing and interpretation techniques.
Interpreting Haskell by hand will be much more complex since you'll have to deal with highly complex features like typeclasses, an extremely powerful type system (type-inference!) and lazy-evaluation (reduction techniques).
So you should define a quite little subset of Haskell to work with and then maybe start by extending the Scheme-example step by step.
Addition:
Note that in Haskell, you have full access to the interpreters API (at least under GHC) including parsers, compilers and of course interpreters.
The package to use is hint (Language.Haskell.*). I have unfortunately neither found online tutorials on this nor tried it out by myself but it looks quite promising.
create a language that has exactly the features of Haskell that I'd like and disallows everything else. As the students progress, I can selectively "turn on" various features once they've mastered the basics.
I suggest a simpler (as in less work involved) solution to this problem. Instead of creating a Haskell implementation where you can turn features off, wrap a Haskell compiler with a program that first checks that the code doesn't use any feature you disallow, and then uses the ready-made compiler to compile it.
That would be similar to HLint (and also kind of its opposite):
HLint (formerly Dr. Haskell) reads Haskell programs and suggests changes that hopefully make them easier to read. HLint also makes it easy to disable unwanted suggestions, and to add your own custom suggestions.
Implement your own HLint "suggestions" to not use the features you don't allow
Disable all the standard HLint suggestions.
Make your wrapper run your modified HLint as a first step
Treat HLint suggestions as errors. That is, if HLint "complained" then the program doesn't proceed to compilation stage
Baskell is a teaching implementation, http://hackage.haskell.org/package/baskell
You might start by picking just, say, the type system to implement. That's about as complicated as an interpreter for Scheme, http://hackage.haskell.org/package/thih
The EHC series of compilers is probably the best bet: it's actively developed and seems to be exactly what you want - a series of small lambda calculi compilers/interpreters culminating in Haskell '98.
But you could also look at the various languages developed in Pierce's Types and Programming Languages, or the Helium interpreter (a crippled Haskell intended for students http://en.wikipedia.org/wiki/Helium_(Haskell)).
If you're looking for a subset of Haskell that's easy to implement, you can do away with type classes and type checking. Without type classes, you don't need type inference to evaluate Haskell code.
I wrote a self-compiling Haskell subset compiler for a Code Golf challenge. It takes Haskell subset code on input and produces C code on output. I'm sorry there isn't a more readable version available; I lifted nested definitions by hand in the process of making it self-compiling.
For a student interested in implementing an interpreter for a subset of Haskell, I would recommend starting with the following features:
Lazy evaluation. If the interpreter is in Haskell, you might not have to do anything for this.
Function definitions with pattern-matched arguments and guards. Only worry about variable, cons, nil, and _ patterns.
Simple expression syntax:
Integer literals
Character literals
[] (nil)
Function application (left associative)
Infix : (cons, right associative)
Parenthesis
Variable names
Function names
More concretely, write an interpreter that can run this:
-- tail :: [a] -> [a]
tail (_:xs) = xs
-- append :: [a] -> [a] -> [a]
append [] ys = ys
append (x:xs) ys = x : append xs ys
-- zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
zipWith f (a:as) (b:bs) = f a b : zipWith f as bs
zipWith _ _ _ = []
-- showList :: (a -> String) -> [a] -> String
showList _ [] = '[' : ']' : []
showList show (x:xs) = '[' : append (show x) (showItems show xs)
-- showItems :: (a -> String) -> [a] -> String
showItems show [] = ']' : []
showItems show (x:xs) = ',' : append (show x) (showItems show xs)
-- fibs :: [Int]
fibs = 0 : 1 : zipWith add fibs (tail fibs)
-- main :: String
main = showList showInt (take 40 fibs)
Type checking is a crucial feature of Haskell. However, going from nothing to a type-checking Haskell compiler is very difficult. If you start by writing an interpreter for the above, adding type checking to it should be less daunting.
You might look at Happy (a yacc-like parser in Haskell) which has a Haskell parser.
This might be a good idea - make a tiny version of NetLogo in Haskell. Here is the tiny interpreter.
see if helium would make a better base to build upon than standard haskell.
Uhc/Ehc is a series of compilers enabling/disabling various Haskell features.
http://www.cs.uu.nl/wiki/Ehc/WebHome#What_is_UHC_And_EHC
I've been told that Idris has a fairly compact parser, not sure if it's really suitable for alteration, but it's written in Haskell.
Andrej Bauer's Programming Language Zoo has a small implementation of a purely functional programming language somewhat cheekily named "minihaskell". It is about 700 lines of OCaml, so very easy to digest.
The site also contains toy versions of ML-style, Prolog-style and OO programming languages.
Don't you think it would be easier to take the GHC sources and strip out what you don't want, than it would be to write your own Haskell interpreter from scratch? Generally speaking, there should be a lot less effort involved in removing features as opposed to creating/adding features.
GHC is written in Haskell anyway, so technically that stays with your question of a Haskell interpreter written in Haskell.
It probably wouldn't be too hard to make the whole thing statically linked and then only distribute your customized GHCi, so that the students can't load other Haskell source modules. As to how much work it would take to prevent them from loading other Haskell object files, I have no idea. You might want to disable FFI too, if you have a bunch of cheaters in your classes :)
The reason why there are so many LISP interpreters is that LISP is basically a predecessor of JSON: a simple format to encode data. This makes the frontend part quite easy to handle. Compared to that, Haskell, especially with Language Extensions, is not the easiest language to parse.
These are some syntactical constructs that sound tricky to get right:
operators with configurable precedence, associativity, and fixity,
nested comments
layout rule
pattern syntax
do- blocks and desugaring to monadic code
Each of these, except maybe the operators, could be tackled by students after their Compiler Construction Course, but it would take the focus away from how Haskell actually works. In addition to that, you might not want to implement all syntactical constructs of Haskell directly, but instead implement passes to get rid of them. Which brings us to the literal core of the issue, pun fully intended.
My suggestion is to implement typechecking and an interpreter for Core instead of full Haskell. Both of these tasks are quite intricate by themselves already.
This language, while still a strongly typed functional language, is way less complicated to deal with in terms of optimization and code generation.
However, it is still independent from the underlying machine.
Therefore, GHC uses it as an intermediary language and translates most syntaxical constructs of Haskell into it.
Additionally, you should not shy away from using GHC's (or another compiler's) frontend.
I'd not consider that as cheating since custom LISPs use the host LISP system's parser (at least during bootstrapping). Cleaning up Core snippets and presenting them to students, along with the original code, should allow you to give an overview of what the frontend does, and why it is preferable to not reimplement it.
Here are a few links to the documentation of Core as used in GHC:
System FC: equality constraints and coercions
GHC/As a library
The Core type

Resources