Why was function application chosen as default Haskell operator, not composition? - haskell

Haskell syntax requires relatively noisy f . g $ 3 compared to 3 g f as in stack-oriented languages. What were main design arguments for this choice?

That could also be written f (g 3).
Why is Haskell not a concatenative language?
Based on A History of Haskell, it was influenced by a variety of functional programming and lazy language experiments, including ML. As section 4, Syntax describes:
Currying
Following a tradition going back to Frege, a function of two arguments may be represented as a function of one argument that itself returns a function of one argument. This tradition was honed by Moses Sch ̈onfinkel and Haskell Curry and came to be called currying. Function application is denoted by juxtaposition and associates to the left. Thus, f x y is parsed (f x) y. This leads to concise and powerful code. For example, to square each number in a list we write map square [1,2,3], while to square each number in a list of lists we write map (map square) [[1,2],[3]]. Haskell, like many other languages based on lambda calculus, supports both curried and uncurried definitions,
The concept of currying is so central to Haskell's semantics and the lambda calculus at its core that any other method of arrangement would interact poorly with the language.

The stack-oriented style doesn't so much compose as sequence functions; 3 g f is such a language is rather f $ g $ 3 in Haskell. Of course, that's equivalent to f . g $ 3, but it only works as long as you immediately apply the composition to some value. In Haskell, you very often compose functions just to hand them to some higher-order combinator, or to make a point-free definition. In a stack-oriented language that requires some sort of explicit block, in Haskell it requires just the . operator.
Usually, you don't just chain "atomic" functions. Certainly you don't deal with globally-named single-letter functions, so the tiny . or $ doesn't really make a dramatic difference verbosity-wise. And very often, as rmmh said, you chain partially applied functions, e.g.
main = interact $ unlines . take 10 . filter ((>20) . length) . lines
That's much more cumbersome without cheap tight-binding application. Also, it's very natural to have the seperating . to mark what's not immediately applied but just composed.

If you're interested in the history of Haskell, Hudak, Hughes, Peyton Jones & Wadler's "A History of Haskell: Being Lazy with Class" is the best-known paper on this topic, and well-worth reading.
It doesn't address your question directly, but it does point out one very relevant fact: Haskell was created as a unifying compromise between a bunch of existing languages from small teams. Quoting section 2.2 ("A tower of Babel"):
As a result of all this activity, by the mid-1980s there were a number of researchers, including the authors, who were keenly interested in both design and implementation techniques for pure, lazy languages. In fact, many of us had independently designed our own lazy languages and were busily building our own implementations for them. We were each writing papers about our efforts, in which we first had to describe our languages before we could describe our implementation techniques. Languages that contributed to this lazy Tower of Babel include:
Miranda […]
Lazy ML (LML) […]
Orwell […]
Alfl […]
Id […]
Clean […]
Ponder […]
Daisy […]
So the answer may simply be that Haskell copied this from its predecessor languages. And since a bunch of these languages were in turn based or inspired by Lisp and ML, they may analogously have copied it from them. So back to quote your question:
What were main design arguments for this choice?
Chances are that there was never a sustained argument for the choice. Very few high-level languages have gone for the stack-based design, in any case, and few people know them.

My guess would be lambda calculus and usefulness (in real world scenarios).
In lambda calculus, space is application, and thus it feels more similar to people who know it.
In most commonly used languages, the usual thing to do with a function is to apply it. Haskell is not a stack-based language, so the choice was made there.

Related

Does Prolog use lazy evaluation like Haskell? [duplicate]

Because Prolog uses chronological backtracking(from the Prolog Wikipedia page) even after an answer is found(in this example where there can only be one solution), would this justify Prolog as using eager evaluation?
mother_child(trude, sally).
father_child(tom, sally).
father_child(tom, erica).
father_child(mike, tom).
sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y).
parent_child(X, Y) :- father_child(X, Y).
parent_child(X, Y) :- mother_child(X, Y).
With the following output:
?- sibling(sally, erica).
true ;
false.
To summarize the discussion with #WillNess below, yes, Prolog is strict. However, Prolog's execution model and semantics are substantially different from the languages that are usually labelled strict or non-strict. For more about this, see below.
I'm not sure the question really applies to Prolog, because it doesn't really have the kind of implicit evaluation ordering that other languages have. Where this really comes into play in a language like Haskell, you might have an expression like:
f (g x) (h y)
In a strict language like ML, there is a defined evaluation order: g x will be evaluated, then h y, and f (g x) (h y) last. In a language like Haskell, g x and h y will only be evaluated as required ("non-strict" is more accurate than "lazy"). But in Prolog,
f(g(X), h(Y))
does not have the same meaning, because it isn't using a function notation. The query would be broken down into three parts, g(X, A), h(Y, B), and f(A,B,C), and those constituents can be placed in any order. The evaluation strategy is strict in the sense that what comes earlier in a sequence will be evaluated before what comes next, but it is non-strict in the sense that there is no requirement that variables be instantiated to ground terms before evaluation can proceed. Unification is perfectly content to complete without having given you values for every variable. I am bringing this up because you have to break down a complex, nested expression in another language into several expressions in Prolog.
Backtracking has nothing to do with it, as far as I can tell. I don't think backtracking to the nearest choice point and resuming from there precludes a non-strict evaluation method, it just happens that Prolog's is strict.
That Prolog pauses after giving each of the several correct answers to a problem has nothing to do with laziness; it is a part of its user interaction protocol. Each answer is calculated eagerly.
Sometimes there will be only one answer but Prolog doesn't know that in advance, so it waits for us to press ; to continue search, in hopes of finding another solution. Sometimes it is able to deduce it in advance and will just stop right away, but only sometimes.
update:
Prolog does no evaluation on its own. All terms are unevaluated, as if "quoted" in Lisp.
Prolog will unfold your predicate definitions as written and is perfectly happy to keep your data structures full of unevaluated uninstantiated holes, if so entailed by your predicate definitions.
Haskell does not need any values, a user does, when requesting an output.
Similarly, Prolog produces solutions one-by-one, as per the user requests.
Prolog can even be seen to be lazier than Haskell where all arithmetic is strict, i.e. immediate, whereas in Prolog you have to explicitly request the arithmetic evaluation, with is/2.
So perhaps the question is ill-posed. Prolog's operations model is just too different. There are no "results" nor "functions", for one; but viewed from another angle, everything is a result, and predicates are "multi"-functions.
As it stands, the question is not correct in what it states. Chronological backtracking does not mean that Prolog will necessarily backtrack "in an example where there can be only one solution".
Consider this:
foo(a, 1).
foo(b, 2).
foo(c, 3).
?- foo(b, X).
X = 2.
?- foo(X, 2).
X = b.
So this is an example that does have only one solution and Prolog recognizes that, and does not attempt to backtrack. There are cases in which you can implement a solution to a problem in a way that Prolog will not recognize that there is only one logical solution, but this is due to the implementation and is not inherent to Prolog's execution model.
You should read up on Prolog's execution model. From the Wikipedia article which you seem to cite, "Operationally, Prolog's execution strategy can be thought of as a generalization of function calls in other languages, one difference being that multiple clause heads can match a given call. In that case, [emphasis mine] the system creates a choice-point, unifies the goal with the clause head of the first alternative, and continues with the goals of that first alternative." Read Sterling and Shapiro's "The Art of Prolog" for a far more complete discussion of the subject.
from Wikipedia I got
In eager evaluation, an expression is evaluated as soon as it is bound to a variable.
Then I think there are 2 levels - at user level (our predicates) Prolog is not eager.
But it is at 'system' level, because variables are implemented as efficiently as possible.
Indeed, attributed variables are implemented to be lazy, and are rather 'orthogonal' to 'logic' Prolog variables.

What are super combinators and constant applicative forms?

I'm struggling with what Super Combinators are:
A supercombinator is either a constant, or a combinator which contains only supercombinators as subexpressions.
And also with what Constant Applicative Forms are:
Any super combinator which is not a lambda abstraction. This includes truly constant expressions such as 12, ((+) 1 2), [1,2,3] as well as partially applied functions such as ((+) 4). Note that this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
This is just not making any sense to me! Isn't ((+) 4) just as "truly constant" as 12? CAFs sound like values to my simple mind.
These Haskell wiki pages you reference are old, and I think unfortunately written. Particularly unfortunate is that they mix up CAFs and supercombinators. Supercombinators are interesting but unrelated to GHC. CAFs are still very much a part of GHC, and can be understood without reference to supercombinators.
So let's start with supercombinators. Combinators derive from combinatory logic, and, in the usage here, consist of functions which only apply the values passed in to one another in one or another form -- i.e. they combine their arguments. The most famous set of combinators are S, K, and I, which taken together are Turing-complete. Supercombinators, in this context, are functions built only of values passed in, combinators, and other supercombinators. Hence any supercombinator can be expanded, through substitution, into a plain old combinator.
Some compilers for functional languages (not GHC!) use combinators and supercombinators as intermediate steps in compilation. As with any similar compiler technology, the reason for doing this is to admit optimization analysis that is more easily performed in such a simplified, minimal language. One such core language built on supercombinators is Edwin Brady's epic.
Constant Applicative Forms are something else entirely. They're a bit more subtle, and have a few gotchas. The way to think of them is as an aspect of compiler implementation with no separate semantic meaning but with a potentially profound effect on runtime performance. The following may not be a perfect description of a CAF, but it'll try to convey my intuition of what one is, since I haven't seen a really good description anywhere else for me to crib from. The clean "authoritative" description in the GHC Commentary Wiki reads as follows:
Constant Applicative Forms, or CAFs for short, are top-level values
defined in a program. Essentially, they are objects that are not
allocated dynamically at run-time but, instead, are part of the static
data of the program.
That's a good start. Pure, functional, lazy languages can be thought of in some sense as a graph reduction machine. The first time you demand the value of a node, that forces its evaluation, which in turn can demand the values of subnodes, etc. One a node is evaluated, the resultant value sticks around (although it does not have to stick around -- since this is a pure language we could always keep the subnodes live and recalculate with no semantic effect). A CAF is indeed just a value. But, in the context, a special kind of value -- one which the compiler can determine has a meaning entirely dependent on its subnodes. That is to say:
foo x = ...
where thisIsACaf = [1..10::Int]
thisIsNotACaf = [1..x::Int]
thisIsAlsoNotACaf :: Num a => [a]
thisIsAlsoNotACaf = [1..10] -- oops, polymorphic! the "num" dictionary is implicitly a parameter.
thisCouldBeACaf = const [1..10::Int] x -- requires a sufficiently smart compiler
thisAlsoCouldBeACaf _ = [1..10::Int] -- also requires a sufficiently smart compiler
So why do we care if things are CAFs? Basically because sometimes we really really don't want to recompute something (for example, a memotable!) and so want to make sure it is shared properly. Other times we really do want to recompute something (e.g. a huge boring easy to generate list -- such as the naturals -- which we're just walking over) and not have it stick around in memory forever. A combination of naming things and binding them under lets or writing them inline, etc. typically lets us specify these sorts of things in a natural, intuitive way. Occasionally, however, the compiler is smarter or dumber than we expect, and something we think should only be computed once is always recomputed, or something we don't want to hang on to gets lifted out as a CAF. Then, we need to think things through more carefully. See this discussion to get an idea about some of the trickiness involved: A good way to avoid "sharing"?
[By the way, I don't feel up to it, but anyone that wants to should feel free to take as much of this answer as they want to try and integrate it with the existing Haskell Wiki pages and improve/update them]
Matt is right in that the definition is confusing. It is even contradictory. A CAF is defined as:
Any super combinator which is not a lambda abstraction. This includes
truly constant expressions such as 12, ((+) 1 2), [1,2,3] as
well as partially applied functions such as ((+) 4).
Hence, ((+) 4) is seen as a CAF. But in the very next sentence we're told it is equivalent to something that is not a CAF:
this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
It would be cleaner to rule out partially applied functions on the ground that they are equivalent to lambda abstractions.

Are there any purely functional Schemes or Lisps?

I've played around with a few functional programming languages and really enjoy the s-expr syntax used by Lisps (Scheme in particular).
I also see the advantages of working in a purely functional language. Therefore:
Are there any purely functional Schemes (or Lisps in general)?
The new Racket language (formerly PLT Scheme) allows you to implement any semantics you like with s-expressions (really any syntax). The base language is an eagerly evaluated, dynamically typed scheme variant but some notable languages built on top are a lazy scheme and a functional reactive system called Father Time.
An easy way to make a purely functional language in Racket is to take the base language and not provide any procedures that mutate state. For example:
#lang racket/base
(provide (except-out (all-from-out racket/base) set! ...more here...))
makes up a language that has no set!.
I don't believe there are any purely functional Lisps, but Clojure is probably the closest.
Rich Hickey, the creator of Clojure:
Why did I write yet another programming language?
Basically because I wanted a Lisp for Functional Programming
designed for Concurrency and couldn't find one.
http://clojure.org/rationale
Clojure is functional, with immutable data types and variables, but you can get mutable behavior in some special cases or by dropping down to Java (Clojure runs on the JVM).
This is by design - another quote by Rich is
A purely functional programming
language is only good for heating your
computer.
See the presentation of Clojure for Lisp programmers.
Are there any purely functional Schemes (or Lisps in general)?
The ACL2 theorem prover is a pure Lisp. It is, however, intended for theorem proving rather than programming, and in particular it is limited to first-order programs. It has, however, been extremely successful in its niche.
Among other things, it won the 2005 ACM Software System Award.
Probably not, at least not as anything other than toys/proofs of concept. Note that even Haskell isn't 100% purely functional--it has secret escape hatches, and anything in IO is only "pure" in some torturous, hand-waving sense of the word.
So, that said, do you really need a purely functional language? You can write purely functional code in almost any language, with varying degrees of inconvenience and inefficiency.
Of course, languages that assume universal state-modification make it painful to keep things pure, so perhaps what you really want is a language that encourages immutability? In that case, you might find it worthwhile to take a look at Clojure's philosophy. And it's a Lisp, to boot!
As a final note, do realize that most of Haskell's "syntax" is thick layers of sugar. The underlying language is not much more than a typed lambda calculus, and nothing stops you from writing all your code that way. You may get funny looks from other Haskell programmers, though. There's also Liskell but I'm not sure what state it's in these days.
On a final, practical note: If you want to actually write code you intend to use, not just tinker with stuff for fun, you'll really want a clever compiler that knows how to work with pure code/immutable data structures.
inconsistent and non-extendable syntax
What is "inconsistency" here?
It is odd to base a language choice soley on syntax. After all, learning syntax will take a few hours -- it is a tiny fraction of the investment required.
In comparison, important considerations like speed, typing discipline, portability, breadth of libraries, documentation and community, have far greater impact on whether you can be productive.
Ignoring all the flame bait, a quick google for immutable Scheme yields some results:
http://blog.plt-scheme.org/2007/11/getting-rid-of-set-car-and-set-cdr.html
30 years ago there was lispkit lisp
Not sure how accesible it is today.
[Thats one of the places where I learnt functional programming]
there is owl lisp, a dialect of scheme R5RS with all data structures made immutable and some additional pure data structures. It is not a large project, but seems to be actively developed and used by a small group of people (from what I can see on the website & git repository). There are also plans to include R7RS support and some sort of type inference. So while probably not ready for production use, this might be a fun thing to play with.
If you like lisp's syntax then you can actually do similar things in Haskell
let fibs = ((++) [1, 1] (zipWith (+) fibs (tail fibs)))
The let fibs = aside. You can always use s-expr syntax in Haskell expressions. This is because you can always add parentheses on the outside and it won't matter. This is the same code without redundant parentheses:
let fibs = (++) [1, 1] (zipWith (+) fibs (tail fibs))
And here it is in "typical" Haskell style:
let fibs = [1, 1] ++ zipWith (+) fibs (tail fibs)
There are a couple of projects that aim to use haskell underneath a lispy syntax. The older, deader, and more ponderous one is "Liskell". The newer, more alive, and lighter weight one is hasp. I think you might find it worth a look.

Haskell or Standard ML for beginners? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm going to be teaching a lower-division course in discrete structures. I have selected the text book Discrete Structures, Logic, and Computability in part because it contains examples and concepts that are conducive to implementation with a functional programming language. (I also think it's a good textbook.)
I want an easy-to-understand FP language to illustrate DS concepts and that the students can use. Most students will have had only one or two semesters of programming in Java, at best. After looking at Scheme, Erlang, Haskell, Ocaml, and SML, I've settled on either Haskell or Standard ML. I'm leaning towards Haskell for the reasons outlined below, but I'd like the opinion of those who are active programmers in one or the other.
Both Haskell and SML have pattern matching which makes describing a recursive algorithm a cinch.
Haskell has nice list comprehensions that match nicely with the way such lists are expressed mathematically.
Haskell has lazy evaluation. Great for constructing infinite lists using the list comprehension technique.
SML has a truly interactive interpreter in which functions can be both defined and used. In Haskell, functions must be defined in a separate file and compiled before being used in the interactive shell.
SML gives explicit confirmation of the function argument and return types in a syntax that's easy to understand. For example: val foo = fn : int * int -> int. Haskell's implicit curry syntax is a bit more obtuse, but not totally alien. For example: foo :: Int -> Int -> Int.
Haskell uses arbitrary-precision integers by default. It's an external library in SML/NJ. And SML/NJ truncates output to 70 characters by default.
Haskell's lambda syntax is subtle -- it uses a single backslash. SML is more explicit. Not sure if we'll ever need lambda in this class, though.
Essentially, SML and Haskell are roughly equivalent. I lean toward Haskell because I'm loving the list comprehensions and infinite lists in Haskell. But I'm worried that the extensive number of symbols in Haskell's compact syntax might cause students problems. From what I've gathered reading other posts on SO, Haskell is not recommended for beginners starting out with FP. But we're not going to be building full-fledged applications, just trying out simple algorithms.
What do you think?
Edit: Upon reading some of your great responses, I should clarify some of my bullet points.
In SML, there's no syntactic distinction between defining a function in the interpreter and defining it in an external file. Let's say you want to write the factorial function. In Haskell you can put this definition into a file and load it into GHCi:
fac 0 = 1
fac n = n * fac (n-1)
To me, that's clear, succinct, and matches the mathematical definition in the book. But if you want to write the function in GHCi directly, you have to use a different syntax:
let fac 0 = 1; fac n = n * fac (n-1)
When working with interactive interpreters, from a teaching perspective it's very, very handy when the student can use the same code in both a file and the command line.
By "explicit confirmation of the function," I meant that upon defining the function, SML right away tells you the name of the function, the types of the arguments, and the return type. In Haskell you have to use the :type command and then you get the somewhat confusing curry notation.
One more cool thing about Haskell -- this is a valid function definition:
fac 0 = 1
fac (n+1) = (n+1) * fac n
Again, this matches a definition they might find in the textbook. Can't do that in SML!
Much as I love Haskell, here are the reasons I would prefer SML for a class in discrete math and data structures (and most other beginners' classes):
Time and space costs of Haskell programs can be very hard to predict, even for experts. SML offers much more limited ways to blow the machine.
Syntax for function defintion in an interactive interpreter is identical to syntax used in a file, so you can cut and paste.
Although operator overloading in SML is totally bogus, it is also simple. It's going to be hard to teach a whole class in Haskell without having to get into type classes.
Student can debug using print. (Although, as a commenter points out, it is possible to get almost the same effect in Haskell using Debug.Trace.trace.)
Infinite data structures blow people's minds. For beginners, you're better off having them define a stream type complete with ref cells and thunks, so they know how it works:
datatype 'a thunk_contents = UNEVALUATED of unit -> 'a
| VALUE of 'a
type 'a thunk = 'a thunk_contents ref
val delay : (unit -> 'a) -> 'a thunk
val force : 'a thunk -> 'a
Now it's not magic any more, and you can go from here to streams (infinite lists).
Layout is not as simple as in Python and can be confusing.
There are two places Haskell has an edge:
In core Haskell you can write a function's type signature just before its definition. This is hugely helpful for students and other beginners. There just isn't a nice way to deal with type signatures in SML.
Haskell has better concrete syntax. The Haskell syntax is a major improvement over ML syntax. I have written a short note about when to use parentheses in an ML program; this helps a little.
Finally, there is a sword that cuts both ways:
Haskell code is pure by default, so your students are unlikely to stumble over impure constructs (IO monad, state monad) by accident. But by the same token, they can't print, and if you want to do I/O then at minumum you have to explain do notation, and return is confusing.
On a related topic, here is some advice for your course preparation: don't overlook Purely Functional Data Structures by Chris Okasaki. Even if you don't have your students use it, you will definitely want to have a copy.
We teach Haskell to first years at our university. My feelings about this are a bit mixed. On the one hand teaching Haskell to first years means they don't have to unlearn the imperative style. Haskell can also produce very concise code which people who had some Java before can appreciate.
Some problems I've noticed students often have:
Pattern matching can be a bit difficult, at first. Students initially had some problems seeing how value construction and pattern matching are related. They also had some problems distinguishing between abstractions. Our exercises included writing functions that simplify arithmetic expression and some students had difficulty seeing the difference between the abstract representation (e.g., Const 1) and the meta-language representation (1).
Furthermore, if your students are supposed to write list processing functions themselves, be careful pointing out the difference between the patterns
[]
[x]
(x:xs)
[x:xs]
Depending on how much functional programming you want to teach them on the way, you may just give them a few library functions and let them play around with that.
We didn't teach our students about anonymous functions, we simply told them about where clauses. For some tasks this was a bit verbose, but worked well otherwise. We also didn't tell them about partial applications; this is probably quite easy to explain in Haskell (due to its form of writing types) so it might be worth showing to them.
They quickly discovered list comprehensions and preferred them over higher-order functions like filter, map, zipWith.
I think we missed out a bit on teaching them how to let them guide their thoughts by the types. I'm not quite sure, though, whether this is helpful to beginners or not.
Error messages are usually not very helpful to beginners, they might occasionally need some help with these. I haven't tried it myself, but there's a Haskell compiler specifically targeted at newcomers, mainly by means of better error messages: Helium
For the small programs, things like possible space leaks weren't an issue.
Overall, Haskell is a good teaching language, but there are a few pitfalls. Given that students feel a lot more comfortable with list comprehensions than higher-order functions, this might be the argument you need. I don't know how long your course is or how much programming you want to teach them, but do plan some time for teaching them basic concepts--they will need it.
BTW,
# SML has a truly interactive
interpreter in which functions can be
both defined and used. In Haskell,
functions must be defined in a
separate file and compiled before
being used in the interactive shell.
Is inaccurate. Use GHCi:
Prelude> let f x = x ^ 2
Prelude> f 7
49
Prelude> f 2
4
There are also good resources for Haskell in education on the haskell.org edu. page, with experiences from different teachers. http://haskell.org/haskellwiki/Haskell_in_education
Finally, you'll be able to teach them multicore parallelism just for fun, if you use Haskell :-)
Many universities teach Haskell as a first functional language or even a first programming language, so I don't think this will be a problem.
Having done some of the teaching on one such course, I don't agree that the possible confusions you identify are that likely. The most likely sources of early confusion are parsing errors caused by bad layout, and mysterious messages about type classes when numeric literals are used incorrectly.
I'd also disagree with any suggestion that Haskell is not recommended for beginners starting out with FP. It's certainly the big bang approach in ways that strict languages with mutation aren't, but I think that's a very valid approach.
SML has a truly interactive interpreter in which functions can be both defined and used. In Haskell, functions must be defined in a separate file and compiled before being used in the interactive shell.
While Hugs may have that limitation, GHCi does not:
$ ghci
GHCi, version 6.10.1: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.
Prelude> let hello name = "Hello, " ++ name
Prelude> hello "Barry"
"Hello, Barry"
There's many reasons I prefer GHC(i) over Hugs, this is just one of them.
SML gives explicit confirmation of the function argument and return types in a syntax that's easy to understand. For example: val foo = fn : int * int -> int. Haskell's implicit curry syntax is a bit more obtuse, but not totally alien. For example: foo :: Int -> Int -> Int.
SML has what you call "implicit curry" syntax as well.
$ sml
Standard ML of New Jersey v110.69 [built: Fri Mar 13 16:02:47 2009]
- fun add x y = x + y;
val add = fn : int -> int -> int
Essentially, SML and Haskell are roughly equivalent. I lean toward Haskell because I'm loving the list comprehensions and infinite lists in Haskell. But I'm worried that the extensive number of symbols in Haskell's compact syntax might cause students problems. From what I've gathered reading other posts on SO, Haskell is not recommended for beginners starting out with FP. But we're not going to be building full-fledged applications, just trying out simple algorithms.
I like using Haskell much more than SML, but I would still teach SML first.
Seconding nominolo's thoughts, list comprehensions do seem to slow students from getting to some higher-order functions.
If you want laziness and infinite lists, it's instructive to implement it explicitly.
Because SML is eagerly evaluated, the execution model is far easier to comprehend, and "debugging via printf" works a lot better than in Haskell.
SML's type system is also simpler. While your class likely wouldn't use them anyways, Haskell's typeclasses are still an extra bump to get over -- getting them to understand the 'a versus ''a distinction in SML is tough enough.
Most answers were technical, but I think you should consider at least one that is not: Haskell (as OCaml), at this time, has a bigger community using it in a wider range of contexts. There's also a big database of libraries and applications written for profit and fun at Hackage. That may be an important factor in keeping some of your students using the language after your course is finished, and maybe trying other functional languages (like Standard ML) later.
I am amazed you are not considering OCaml and F# given that they address so many of your concerns. Surely decent and helpful development environments are a high priority for learners? SML is way behind and F# is way ahead of all other FPLs in that respect.
Also, both OCaml and F# have list comprehensions.
Haskell. I'm ahead in my algos/theory class in CS because of the stuff I learned from using Haskell. It's such a comprehensive language, and it will teach you a ton of CS, just by using it.
However, SML is much easier to learn. Haskell has features such as lazy evaluation and control structures that make it much more powerful, but with the cost of a steep(ish) learning curve. SML has no such curve.
That said, most of Haskell was unlearning stuff from less scientific/mathematic languages such as Ruby, ObjC, or Python.

Examining the internals of the functions in Haskell

I am a Haskell newbie, though had a previous Lisp/Scheme experience. Right now I am looking at the examples from SICP and trying to implement them in Haskell to get more hands-on experience. In the lecture 3b authors present a function for computing the derivatives symbolically. It contains, among others, the following lines:
(define (deriv exp var)
(cond ((constant? exp var) 0)
((same-var? exp var) 1)
; ...
Further in the lecture, some more functions are defined:
(define (constant? exp var)
(and (atom? exp)
(not (eq? exp var))))
Is there a way to do same thing in Haskell, i.e. check for atomicity and symbolic equivalence to some other function? Or more general, what are the means of "disassembling" functions in Haskell?
Firstly, although SICP is great, I would recommend against it for learning Haskell.(#) Some of the difficulty in this question stems from this.
In Lisp/Scheme, a 'function' is thought of a piece of code, and examining a function simply means examining its code. In Haskell, a 'function' means something closer to its mathematical definition, as a map from a set A to a set B. So for example, it make sense, in the Lisp context, to compare two functions: just compare their code. (But are (x+y)^2 and x^2+2*x*y+y^2 different functions?) In Haskell, it depends on whether there exists a constructive procedure for determining equality for the class of functions you are considering.
Similarly, as in your question, in Lisp/Scheme, you would write a "derive" function that differentiates correctly when given expressions, and just errors out or returns garbage on arbitrary inputs. Under Haskell's type system, this is (AFAIK) impossible to do, because—if you think about it—there is no such thing as differentiating an arbitrary input: you can only differentiate an Expression (or possibly a more general class, but still not everything). So as in Norman Ramsey's answer, you first define an "Expression" type (or type class), which is very simple to do, and then write the function
derive :: Expression -> Expression
that disassembles an Expression using the pattern-matching constructs (or something else depending on how Expressions were built up).
(#): The reason is that SICP has an entirely different philosophy, which involves using an untyped programming language and encouraging a lack of distinction between code and data. While there is some merit to the "code=data" argument (e.g. the fact that on the von Neumann architecture we use, "everything is 0s and 1s anyway"), it's not necessarily a good way of reasoning about or modelling problems. (See Philip Wadler's Why Calculating is Better than Scheming for more on this.) If you want to read a Haskell book with a functional flavour instead of a Real World one, perhaps Simon Thompson's Haskell: The Craft of Functional Programming or Richard Bird's Introduction to Functional Programming using Haskell are better choices.
Your Scheme examples don't actually examine Scheme functions. I recently did some symbolic differentiation in Haskell over values of the following type:
data Exp a = Lit a
| Exp a :*: Exp a
| Exp a :+: Exp a
| Var String
deriving Eq
Instead of discriminating using atom? or eq? you use case (or other pattern matching) and ==.
I don't think you can do that. Lisp is homoiconic, Haskell isn't.
However, further Googling turned up Liskell, which is (?) an interesting hybrid.

Resources