Will I develop good/bad habits because of lazy evaluation? - haskell

I'm looking to learn functional programming with either Haskell or F#.
Are there any programming habits (good or bad) that could form as a result Haskell's lazy evaluation? I like the idea of Haskell's functional programming purity for the purposes of understanding functional programming. I'm just a bit worried about two things:
I may misinterpret lazy-evaluation-based features as being part of the "functional paradigm".
I may develop thought patterns that work in a lazy world but not in a normal order/eager evaluation world.

There are habits that you get into when programming in a lazy language that don't work in a strict language. Some of these seem so natural to Haskell programmers that they don't think of them as lazy evaluation. A couple of examples off the top of my head:
f x y = if x > y then .. a .. b .. else c
where
a = expensive
b = expensive
c = expensive
here we define a bunch of subexpressions in a where clause, with complete disregard for which of them will ever be evaluated. It doesn't matter: the compiler will ensure that no unnecessary work is performed at runtime. Non-strict semantics means that the compiler is able to do this. Whenever I write in a strict language I trip over this a lot.
Another example that springs to mind is "numbering things":
pairs = zip xs [1..]
here we just want to associate each element in a list with its index, and zipping with the infinite list [1..] is the natural way to do it in Haskell. How do you write this without an infinite list? Well, the fold isn't too readable
pairs = foldr (\x xs -> \n -> (x,n) : xs (n+1)) (const []) xs 1
or you could write it with explicit recursion (too verbose, doesn't fuse). There are several other ways to write it, none of which are as simple and clear as the zip.
I'm sure there are many more. Laziness is surprisingly useful, when you get used to it.

You'll certainly learn about evaluation strategies. Non-strict evaluation strategies can be very powerful for particular kinds of programming problems, and once you're exposed to them, you may be frustrated that you can't use them in some language setting.
I may develop thought patterns that work in a lazy world but not in a normal order/eager evaluation world.
Right. You'll be a more rounded programmer. Abstractions that provide "delaying" mechanisms are fairly common now, so you'd be a worse programmer not to know them.

I may misinterpret lazy-evaluation-based features as being part of the "functional paradigm".
Lazy evaluation is an important part of the functional paradigm. It's not a requirement - you can program functionally with eager evaluation - but it's a tool that naturally fits functional programming.
You see people explicitly implement/invoke it (notably in the form of lazy sequences) in languages that don't make it the default; and while mixing it with imperative code requires caution, pure functional code allows safe use of laziness. And since laziness makes many constructs cleaner and more natural, it's a great fit!
(Disclaimer: no Haskell or F# experience)

To expand on Beni's answer: if we ignore operational aspects in terms of efficiency (and stick with a purely functional world for the moment), every terminating expression under eager evaluation is also terminating under non-strict evaluation, and the values of both (their denotations) coincide.
This is to say that lazy evaluation is strictly more expressive than eager evaluation. By allowing you to write more correct and useful expressions, it expands your "vocabulary" and ability to think functionally.
Here's one example of why:
A language can be lazy-by-default but with optional eagerness, or eager by default with optional laziness, but in fact its been shown (c.f. Okasaki for example) that there are certain purely functional data structures which can only achieve certain orders of performance if implemented in a language that provides laziness either optionally or by default.
Now when you do want to worry about efficiency, then the difference does matter, and sometimes you will want to be strict and sometimes you won't.
But worrying about strictness is a good thing, because very often the cleanest thing to do (and not only in a lazy-by-default language) is to use a thoughtful mix of lazy and eager evaluation, and thinking along these lines will be a good thing no matter which language you wind up using in the future.
Edit: Inspired by Simon's post, one additional point: many problems are most naturally thought about as traversals of infinite structures rather than basically recursive or iterative. (Although such traversals themselves will generally involve some sort of recursive call.) Even for finite structures, very often you only want to explore a small portion of a potentially large tree. Generally speaking, non-strict evaluation allows you to stop mixing up the operational issue of what the processor actually bothers to figure out with the semantic issue of the most natural way to represent the actual structure you're using.

Recently, i found myself doing Haskell-style programming in Python. I took over a monolithic function that extracted/computed/generated values and put them in a file sink, in one step.
I thought this was bad for understanding, reuse and testing. My plan was to separate value generation and value processing. In Haskell i would have generated a (lazy) list of those computed values in a pure function and would have done the post-processing in another (side-effect bearing) function.
Knowing that non-lazy lists in Python can be expensive, if they tend to get big, i thought about the next close Python solution. To me that was to use a generator for the value generation step.
The Python code got much better thanks to my lazy (pun intended) mindset.

I'd expect bad habits.
I saw one of my coworkers try to use (hand-coded) lazy evaluation in our .NET project. Unfortunately the consequence of lazy evaluation hid the bug where it would try remote invocations before the start of main executed, and thus outside the try/catch to handle the "Hey I can't connect to the internet" case.
Basically, the manner of something was hiding the fact that something really expensive was hiding behind a property read and so made it look like a good idea to do inside the type initializer.

Contextual information missing.
Laziness (or more specifically, the assumption of the availabilty of the purity and equational reasoning) is sometimes quite useful for specific problem domains, but not necessarily better in general. If you're talking about general-purpose language settings, relying on the lazy evaluation rules by default is considered harmful.
Analysis
Any languages has functional combination (or the applicable terms combination; i.e. function call expression, function-like macro invocation, FEXPRs, etc.) enforces rules on evaluation, implying the order of different parts of subcomputation therein. For convenience and the simplicity of the specification of the language, a language usually specify the rules in a flavor paired to the reduction strategy:
The strict evaluation, or the applicative-order reduction, which evaluates all subexpression first, before the subcomputation of the remaining evaluation of the hole combination.
The non-strict evaluation, or the normal-order reduction, which does not necessarily evaluate every subexpression at first.
The remaining subcomputation finally determines the result of the whole evaluation of the expression. (For program-defined constructs, this usually implies the substitution of the evaluated argument into something like a function body, and the subsequent evaluation of the result.)
Lazy evaluation, or the call-by-need strategy, is a typical concrete instance of the non-strict evaluation kind. To make it practically usable, subexpression evaluations are required to be pure (side-effect-free), so the reductions implementing the strategy can have the Church-Rosser property whatever the order of subexpression evaluation is actually adopted.
One significant merit of such design is the availability of the equational resoning: users can encode the equality of expression evaluation in the program, and optimizing implementation of the language can perform the transformation depending directly on such constructs.
However, there are many serious problems behind such design.
Equational reasoning is not important as it in the first glance in practice.
The encoding is not a separate feature. It has some specific requirements on the other features to carry the encoding. For a pure language, it is even more difficult to encode them elsewhere, so there is certain pressure to make the type system more expressive, hence more complicated typing and typechecking.
Whether the compiler uses the equational reasoning directly encoded in the program or not is an implementation detail. It is more of a taste of style to promote the importance.
Syntatic equations are not powerful enough to encode semantic conditions like cases of "unspecified behavior" in ISO C. It still needs some additional primitives to express non-determinism of such semantic equivalence classes to make optimization techniques based on such equivalence possible.
It is computationally inefficient at the very basic level by default, and not amendable by the programmer easily.
There is no systemic way to reduce the cost on equations which are known not required by the programmer.
One of the significance comes from the clash between lazily evaluated combinations and proper tail recursion over the combinations.
The unpredictable abuse of thunks to memoize the lazily evaluated expressions also makes troubles on the utilization of the machine resources (e.g. registers and the cache memory).
Purely functional languages like Haskell may declare the referential transparency is a good thingTM. However, this is faulty in certain contexts.
There are semantic gaps over the terminology itself. The purity is not the only aspect for the referential transparency; moreover, there are other kinds of such property not readily provided by the evaluation strategy.
In general, referential transparency should not be a goal about programming. Instead, it is an optional manner to implement the composable components of programs. Composability is essentially about the expected invariance on the interface of the components. There are many ways to keep the composability without the aid of any kinds of referential transparency. Whether the guarantee should be enforced by the language rules? It depends. At least, it should not depend totally on the language designers' point.
The lack of impure evaluations requires more syntax noises to encode many constructs simply expressible by mutable state cells in the traditional impure languages. The workarounds of the practical problems do make the solution more difficult and hard to reason by humans.
For example, I/O operations are side-effectful, thus not directly expressible in Haskell expressions under the usual non-strict evaluation rules, otherwise the order of effects will be non-deterministic.
To overcoming the shortcoming, some indirect conventional constructs like the IO monad to simulate the traditional imperative style are proposed. Such monadic constructs are in essential "indirect" in the sense similar to the continuation-passing style, which is considerably low-level and difficult to read. Even though monads can be "powerful" than continuations in expresiveness, it does not naturally powerful than more high-level alternatives (like algebraic effect systems) when the lazy evaluation strategy is not enforced by default.
Besides the intuition problem above, the necessity of using monadic constructs are often difficult to prove formally (if ever possible). As the result, they are very easily abused (just like the design patterns for "OOP" languages derived from Simula). The related syntax sugar, notably, the famous do-notation, is abused for a few decades before well-known by the Haskell community.
Simulating strict language constructs in languages like Haskell usually needs monadic constructs, while simulating non-strict constructs in strict languages are considerably simpler and easier to implement efficiently. For instance, there is SRFI-45.
The lazy evaluation strategy does not deal with many other non-strict constructs well.
For example, seq has to be a compiler magic in GHC. This is not easily expressible by other Haskell constructs without massive changes in the core Haskell language rules.
Although traditional strict languages also do not allow user programs to simulate the enforcement of the order easily so such sequential constructs are therefore primitive (examples: C-like ; is primitive; the derivation of Scheme's begin is relying on the primitive lambda which in turn implying an implicit evaluation order on expressions), it can be implementable reusing the applicative order rules without additional ad-hoc primitives, like the derivation of the$sequence operator in the Kernel language.
Concerns about specific questions
Lazy evaluation is not a must for the "functional paradigm", though as mentioned above, purely functional languages are likely have the lazy evaluation strategy by default. The common properties are the usability of first-class functions. Impure languages like Lisp and ML family are considered "functional", which use eager evaluation by default. Also note the popularity of "functional paradigm" came after the introducing of function-level programming. The latter is quite different, but still somewhat similar to "functional programming" on the treatment of first-classness.
As mentioned above, the way to simulate laziness in eager languages are well-known. Additionally, for pure programs, there may be no non-trivially semantic difference between call-by-need and normal order reduction. To figure out something really only work in a lazy world is actually not easy. (Do you want to implement the language?) Just go ahead.
Conclusion
Be careful to the problem domain. Lazy evaluation may work well for specific scenarios. However, making it by default is likely to be a bad idea in general, because users (whoever to use the language to program, or to derive a new dialect based on the current language) will likely have few chances to ignore all of the problems it will cause.

Well, try to think of something that would work if lazily evaluated, that wouldn't if eagerly evaluated. The most common category of these would be lazy logical operator evaluation used to hide a "side effect". I'll use C#-ish language to explain, but functional languages would have similar analogs.
Take the simple C# lambda:
(a,b) => a==0 || ++b < 20
In a lazy-evaluated language, if a==0, the expression ++b < 20 is not evaluated (because the entire expression evaluates to true either way), which means that b is not incremented. In both imperative and functional languages, this behavior (and similar behavior of the AND operator) can be used to "hide" logic containing side effects that should not be executed:
(a,b) => a==0 && save(b)
"a" in this case may be the number of validation errors. If there were validation errors, the first half fails and the second half is not evaluated. If there were no validation errors, the second half is evaluated (which would include the side effect of trying to save b) and the result (apparently true or false) is returned to be evaluated. If either side evaluates to false, the lambda returns false indicating that b was not successfully saved. If this were evaluated "eagerly", we would try to save regardless of the value of "a", which would probably be bad if a nonzero "a" indicated that we shouldn't.
Side effects in functional languages are generally considered a no-no. However, there are few non-trivial programs that do not require at least one side effect; there's generally no other way to make a functional algorithm integrate with non-functional code, or with peripherals like a data store, display, network channel, etc.

Related

What is the relationship between static typing and lazy functional languages? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm curious about the relationship between static typing and lazy functional languages. Is it possible to have a dynamic lazy functional language, for instance? It seems as if all of the lazy functional languages out there are statically typed (Haskell, Miranda, etc), and all of the dynamic functional languages use strict evaluation (Clojure, Scheme, etc).
In particular, the Wikipedia article on Lazy evaluation reads:
However, with lazy evaluation, it is difficult to combine with
imperative features such as exception handling and input/output,
because the order of operations becomes indeterminate. Lazy evaluation
can introduce space leaks.
What is the role that static typing plays in preventing space leaks?
I don't believe that static types play a role at all. For example, consider the untyped lazy language, Lazy Racket. I haven't heard any indication that it leaks space in a way that Haskell (for example) does not.
Side effects, on the other hand, are a problem because humans find the order of evaluation of strict evaluation to be (relatively) natural, and call-by-need is much harder to mentally predict.
What is the role that static typing plays in preventing space leaks?
Types can be used to track the lifetime of objects, statically ensuring the absence of leaks.
An example would be region types and other effect types.
Lazy evaluation and static typing are independent concepts.
Untyped
Lazy evaluation The untyped lambda calculus with normal order reduction strategy (or call by need). Always the leftmost redex is evaluated first.
Eager (strict) evaluation The untype lambda calculus with applicative order reduction strategy. Lambda terms are always reduced before they're substituted to other terms.
Typed
Lazy evaluation Haskell is an example.
Eager evaluation OCaml is an example.
Roughly speaking, evaluation is something that happens when the program is run.
Typing is something that happens when the program is compiled.
But of course there is an important correspondence between a typing system and an evaluation strategy:
If a term M reduces to N and M:σ (M is of type σ) then N:σ.
This means that when we run a program that has some type σ then the value
will have the same type. Without this property, a typing system is really
useless (at least for programming). This also means that once we have typed a
program during compilation, we don't need to remember the typing information
when evaluating it, because we know that the result will have the correct type.
Concerning the Wikipedia article you're quoting. There are two different things:
Imperative features. This isn't really related to a typing system. The problem is that if evaluating certain expressions has side effects (like I/O in most languages) in lazy settings it's very hard to predict when (if at all) a side effect occurs. Therefore once you have lazy evaluation, it's hardly possible to have an impure language.
One exception to this is the Clean language. It uses a special type system to handle side effects in a lazy setting. So here there is some connection between the evaluation strategy and the typing system through handling side effects: The type system allows handling side effects in such a way that we can keep lazy evaluation.
Space leaks. This is a known drawback of lazy evaluation. See Building up unevaluated expressions or Ch. 25 Profiling and optimization in Real World Haskell. But again this has nothing to do with type systems - you'd get the same behavior in an untyped language.
I think if you look at things from a more general level, it is possible to observe a natural relationship between static typing and lazy functional languages. The main point of static types is to inform and advance the capabilities of the compiler; surveying the static-dynamic divide among languages, it generally tracks the schism between compiled and interpreted code.
And what is the point of lazy evaluation?
An infamous, retrospective article by Peyton-Jones et al describes lazy evaluation as "the hair shirt" which kept the language purely functional. His metaphor aptly conveys the Haskell community's deep-rooted idealism of denotational semantics. Non-strict evaluation's fundamental benefit is that it transforms the possibilities for structuring code in ways that facilitate this denotational paradigm. In the notorious lazy evaluation debate carried on by Bob Harper and the Haskell community, Prof. Harper demonstrates the challenges lazy evaluation poses for practical programs - and among Lennart Augustsson's defenses of laziness, this one best illustrates the point:
"I've saved my biggest gripe of strict evaluation for last. Strict evaluation is fundamentally flawed for function reuse. [...] With strict evaluation you can no longer with a straight face tell people: don't use recursion, reuse the recursion patterns in map, filter, foldr, etc. It simply doesn't work (in general). [...] strict evaluation really, fundamentally stops you from reusing functions the way you can with laziness."
And for his example of function reuse via lazy evaluation, Augustsson offers, "It's quite natural to express the any function by reusing the map and or functions." So lazy evaluation emerges from this picture as a rather costly language feature embraced in service of a more natural style of coding.
What else do we need to sustain an abstract, denotational style of coding? A powerful optimizing compiler might come in handy! Thus, even if there is no technical or necessary connection between static types and lazy evaluation, the two features are oriented toward the same goal. It's not so surprising they often appear together.

What is the practical use for laziness as a built-in language feature?

It's fairly obvious why a functional programming language that wants to be lazy needs to be pure. I'm looking at the reverse question: if a language wants to be pure, is there a big advantage in being lazy? One argument, made by one of the designers of Haskell, is that it removes temptation; maybe, but I'm trying to weigh up the more concrete advantages.
Given that you want to do functional programming, what are the use cases where built-in laziness lets you express things more clearly, simply or concisely?
Stated simply: Why is laziness so important that you'd want to build it into the language?
(I'm looking for use cases more oriented towards an application rather than a demo - I know you can do things like producing an infinite list of prime numbers by filtering an infinite list of natural numbers, but who writes that ten times 'fore lunch...)
"Nothing is evaluated until it is needed at another place" is a simplified metaphor which doesn't cover all aspects of lazy evaluation (e.g. it doesn't mention the strictness phenomena).
From theoretical standpoint, there are 3 ways to go when designing a pure language (of course if it's based on some kind of lambda calculus and not on more exotic evaluation models): strict, non-strict and total.
Each of them has its advantages and disadvantages, so you need to read corresponding research papers.
Total languages are most pure of the three. In the other two the non-termination can be seen as a side effect, so strictness and totality analysers must be built to keep an implementation efficient. Both analyses are undecidable, so the analyzers can never be complete.
However, the total languages are least expressive: it's impossible for a total language to be Turing complete. A frequent approach to get good enough expressiveness is to have a built-in proof system for well-founded recursion, which is not much easier to build than the analyzers for non-total languages.
From practical standpoint, non-strict semantics lets you more easily define control abstractions, as control structures are essentially non-strict. In a strict language you still need some places with non-strict semantics. E.g. if construct has non-strict semantic even in strict languages.
So if your language is strict, control structures are a special case. In contrast, a non-strict language can be uniformly non-strict - it doesn't have an inherent need in strict constructs.
As for "who writes that ten times 'fore lunch" - anyone who uses Haskell for their projects does. I think developing a non-toy project using a language (a non-strict language in your case) is a best way to grasp its advantages and disadvantages.
Below are a few generic usecases for laziness illustrated by non-toy examples:
Cases when control flow is hard to predict. Think of attribute grammars when without laziness you have to perform a topological sort on attributes to resolve the dependensies. Re-sorting your code every time the dependency graph is changed is not practical. In Haskell you can implement the attribute grammar formalism without an explicit sorting, and there are at least two actual implementations on Hackage. The attribute grammars have wide application in compiler construction.
The "generate and search" approach to solve many optimizaton problems. In a strict language you have to interleave generation and search, in Haskell you just compose separate generation and searching functions, and your code remains syntactically modular, but interleaved at runtime. Think of the traveling salesman problem (TSP), when you generate all possible tours and then search through them using a branch-and-bound algorithm. Note that branch an bound algorithms only inspects certain first cities of a tour, only the necessary parts of routes are generated. The TSP has several applications even in its purest formulation, such as planning, logistics, and the manufacture of microchips. Slightly modified, it appears as a sub-problem in many areas, such as DNA sequencing.
Lazy code has non-modular control flow, so a single function can have many possible control flows depending on the environment it executes in. This phenomena can be seen as some kind of 'control flow polymorphism', so lazy control flow abstractions are more generic than their strict counterparts, and a standard library of higher-order functions is much more useful in a lazy language. Think of Python generators, loops and list iterators: in Haskell list functions cover all three usecases, with control flow adapting to different usage scenarios because of laziness. It is not limited to lists - think of Data.Arrow and iteratees, lazy and strict versions of State monad etc. Also note that non-modular control flow is both an advantage and disadvantage, as it makes reasoning about performance harder.
Lazy possibly infinite data structures are useful beyond toy examples. See works of Conal Elliott on memoizing higher order functions using tries. Infinite data structures appear as infinite search spaces (see 2), infinite loops and never-exhausting generators in Python sense (see 3).
Mac OS X's Core Image is a good practical example of lazy evaluation.
Basically, Core Image lets you create a directed acyclic graph of image generators and filters. No evaluation actually takes place until the last step in the process: materialization. When you request to materialize a Core Image graph, the final image frame is propagated backwards through the graph's transformations, thus minimizing the quantity of actual pixel values that need to be evaluated.
There's an extensive discussion of this point in Hughes's classic Why Functional Programming Matters. Therein, Hughes argues that laziness allows for improved modularity, using a number of accessible examples.

How about having a language provide both call-by-name and call-by-value?

Is it OK to have a language provide both call-by-need (CBN) and call-by-value (CBV) evaluation strategy? I mean without fixing it and simulating in one the other but let the user choose which when in need. For example, let the language has a eval function as in Scheme available which can accept one more argument from the user specifying which evaluation strategy he wants.
Combining call-by-need (laziness) and call-by-value (strictness) in one language implementation is certainly possible, as long as one takes care to avoid making computations with side effects lazy and making diverging computations strict.
Strictness analysis is used in lazy (CBN) functional languages to detect when functions can safely be evaluated using a CBV strategy. CBV evaluation is generally faster, but using this evaluation strategy for non-strict functions changes the semantics of the program.
Wadler describes how to combine lazy and strict computation in a functional language.
A lambda the ultimate thread also addresses the issue.
Scala has a keyword lazy for stating that certain computations are to be performed lazily. Other languages have similar constructs.

What does "pure" mean in "pure functional language"?

Haskell has been called a "pure functional language."
What does "pure" mean in this context? What consequences does this have for a programmer?
In a pure functional language, you can't do anything that has a side effect.
A side effect would mean that evaluating an expression changes some internal state that would later cause evaluating the same expression to have a different result. In a pure functional language you can evaluate the same expression as often as you want with the same arguments, and it would always return the same value, because there is no state to change.
For example, a pure functional language cannot have an assignment operator or do input/output, although for practical purposes, even pure functional languages often call impure libraries to do I/O.
"Pure" and "functional" are two separate concepts, although one is not very useful without the other.
A pure expression is idempotent: it can be evaluated any number of times, with identical results each time. This means the expression cannot have any observable side-effects. For example, if a function mutated its arguments, set a variable somewhere, or changed behavior based on something other than its input alone, then that function call is not pure.
A functional programming language is one in which functions are first-class. In other words, you can manipulate functions with exactly the same ease with which you can manipulate all other first-class values. For example, being able to use a "function returning bool" as a "data structure representing a set" would be easy in a functional programming language.
Programming in functional programming languages is often done in a mostly-pure style, and it is difficult to be strictly pure without higher-order function manipulation enabled by functional programming languages.
Haskell is a functional programming language, in which (almost) all expressions are pure; thus, Haskell is a purely functional programming language.
A pure function is one which has no side effects — it takes a value in and gives a value back. There's no global state that functions modify. A pure functional language is one which forces functions to be pure. Purity has a number of interesting consequences, such as the fact that evaluation can be lazy — since a function call has no purpose but to return a value, then you don't need to actually execute the function if you aren't going to use its value. Thanks to this, things like recursive functions on infinite lists are common in Haskell.
Another consequence is that it doesn't matter in what order functions are evaluated — since they can't affect each other, you can do them in any order that's convenient. This means that some of the problems posed by parallel programming simply don't exist, since there isn't a "wrong" or "right" order for functions to execute.
Strictly speaking, a pure functional language is a functional language (i.e. a language where functions are first-class values) where expressions have no side effects. The term “purely functional language” is synonymous.
By this definition, Haskell is not a pure functional language. Any language in which you can write programs that display their result, read and write files, have a GUI, and so on, is not purely functional. Thus no general purpose programming language is purely functional (but there are useful domain-specific purely functional languages: they can typically be seen as embedded languages in some way).
There is a useful relaxed sense in which languages like Haskell and Erlang can be considered purely functional, but languages like ML and Scheme cannot. A language can be considered purely functional if there is a reasonably large, useful and well-characterised subset where side effects are impossible. For example, in Haskell, all programs whose type is not built from IO or other effect-denoting monad are side-effect-free. In Erlang, all programs that don't use IO-inducing libraries or concurrency features are side-effect-free (this is more of a stretch than the Haskell case). Conversely, in ML or Scheme, a side effect can be buried in any function.
By this perspective, the purely functional subset of Haskell can be seen as the embedded language to deal with the behavior inside each monad (of course this is an odd perspective, as almost all the computation is happening in this “embedded” subset), and the purely functional subset of Erlang can be seen as the embedded language do deal with local behavior.
Graham Hutton has a slightly different, and quite interesting, perspective on the topic of purely functional languages:
Sometimes, the term “purely functional” is also used in a broader sense to mean languages that might incorporate computational effects, but without altering the notion of ‘function’ (as evidenced by the fact that the essential properties of functions are preserved.) Typically, the evaluation of an expression can yield a ‘task’, which is then executed separately to cause computational effects. The evaluation and execution phases are separated in such a way that the evaluation phase does not compromise the standard properties of expressions and functions. The input/output mechanisms of Haskell, for example, are of this kind.
I.e. in Haskell, a function has the type a -> b and can't have side effects. An expression of type IO (a -> b) can have side effects, but it's not a function. Thus in Haskell functions must be pure, hence Haskell is purely functional.
As there cannot be any side effects in pure functional code, testing gets much easier as there is no external state to check or verify. Also, because of this, extending code may become easier.
I lost count of the number of times I had trouble with non-obvious side effects when extending/fixing (or trying to fix) code.
As others have mentioned, the term "pure" in "pure functional programming language" refers to the lack of observable side-effects. For me, this leads straight to the question:
What is a side-effect?
I have seen side-effects explained both as
something that a function does other than simply compute its result
something that can affect the result of a function other than the inputs to the function.
If the first definition is the correct one, then any function that does I/O (e.g. writing to a file) cannot be said to be a "pure" function. Whereas Haskell programs can call functions which cause I/O to be performed, it would seem that Haskell is not a pure functional programming language (as it is claimed to be) according to this definition.
For this and other reasons, I think the second definition is the more useful one. According to the second definition, Haskell can still claim to be a completely pure functional programming language because functions that cause I/O to be performed compute results based only on function inputs. How Haskell reconciles these seemingly conflicting requirements is quite interesting, I think, but I'll resist the temptation to stray any further from answering the actual question.
Amr Sabry wrote a paper about what a pure functional language is. Haskell is by this definition considered pure, if we ignore things like unsafePerformIO. Using this definition also makes ML and Erlang impure. There are subsets of most languages that qualify as pure, but personally I don't think it's very useful to talk about C being a pure language.
Higher-orderness is orthogonal to purity, you can design a pure first-order functional language.

What functional language techniques can be used in imperative languages?

Which techniques or paradigms normally associated with functional languages can productively be used in imperative languages as well?
e.g.:
Recursion can be problematic in languages without tail-call optimization, limiting its use to a narrow set of cases, so that's of limited usefulness
Map and filter have found their way into non-functional languages, even though they have a functional sort of feel to them
I happen to really like not having to worry about state in functional languages. If I were particularly stubborn I might write C programs without modifying variables, only encapsulating my state in variables passed to functions and in values returned from functions.
Even though functions aren't first class values, I can wrap one in an object in Java say, and pass that into another method. Like Functional programming, just less fun.
So, for veterans of functional programming, when you program in imperative languages, what ideas from FP have you applied successfully?
Pretty nearly all of them?
If you understand functional languages, you can write imperative programs that are "informed" by a functional style. That will lead you away from side effects, and toward programs in which reading the program text at any particular point is sufficient to let you really know what the meaning of the program is at that point.
Back at the Dawn of Time we used to worry about "coupling" and "cohesion". Learning an FP will lead you to write systems with optimal (minimal) coupling, and high cohesion.
Here are things that get in the way of doing FP in a non-FP language:
If the language doesn't support lambda/closures, and doesn't have any syntactic sugar to easily mostly hack it, you are dead in the water. You don't call map/filter without closures.
If the language is statically-typed and doesn't support generics, you are dead in the water. All the good FP stuff uses genericity.
If the language doesn't support tail-recursion, you are hindered. You can write implementations of e.g. 'map' iteratively; also often your data may not be too large and recursion will be ok.
If the language does not support algebraic data types and pattern-matching, you will be mildly hindered. It's just annoying not to have them once you've tasted them.
If the language cannot express type classes, well, oh well... you'll get by, but darn if that's not just the awesomest feature ever, but Haskell is the only remotely popular language with good support.
Not having first-class functions really puts a damper on writing functional programs, but there are a few things that you can do that don't require them. The first is to eschew mutable state - try to have most or all of your classes return new objects that represent the modified state instead of making the change internally. As an example, if you were writing a linked list with an add operation, you would want to return the new linked list from add as opposed to modifying the object.
While this may make your programs less efficient (due to the increased number of objects being created and destroyed) you will gain the ability to more easily debug the program because the state and operation of the objects becomes more predictable, not to mention the ability to nest function calls more deeply because they have state inputs and outputs.
I've successfully used higher-order functions a lot, especially the kind that are passed in rather than the kind that are returned. The kind that are returned can be a bit tedious but can be simulated.
All sorts of applicative data structures and recursive functions work well in imperative languages.
The things I miss the most:
Almost no imperative languages guarantee to optimize every tail call.
I know of no imperative language that supports case analysis by pattern matching.

Resources