What makes Iteratees worth the complexity?

What makes Iteratees worth the complexity? - haskell

First, I understand the how of iteratees, well enough that I could probably write a simplistic and buggy implementation without referring back to any existing ones.
What I'd really like to know is why people seem to find them so fascinating, or under what circumstances their benefits justify their complexity. Comparing them to lazy I/O there is a very clear benefit, but that seems an awful lot like a straw man to me. I never felt comfortable about lazy I/O in the first place, and I avoid it except for the occasional hGetContents or readFile, mostly in very simple programs.
In real-world scenarios I generally use traditional I/O interfaces with control abstractions appropriate to the task. In that context I just don't see the benefit of iteratees, or to what task they are an appropriate control abstraction. Most of the time they seem more like unnecessary complexity or even a counterproductive inversion of control.
I've read a fair number of articles about them and sources that make use of them, but have not yet found a compelling example that actually made me think anything along the lines of "oh, yea, I'd have used them there too." Maybe I just haven't read the right ones. Or perhaps there is a yet-to-be-devised interface, simpler than any I've yet seen, that would make them feel less like a Swiss Army Chainsaw.
Am I just suffering from not-invented-here syndrome or is my unease well-founded? Or is it perhaps something else entirely?

As to why people find them so fascinating, I think because they're such a simple idea. The recent discussion on Haskell-cafe about a denotational semantics for iteratees devolved into a consensus that they're so simple they're barely worth describing. The phrase "little more than a glorified left-fold with a pause button" sticks out to me from that thread. People who like Haskell tend to be fond of simple, elegant structures, so the iteratee idea is likely very appealing.
For me, the chief benefits of iteratees are
Composability. Not only can iteratees be composed, but enumerators can too. This is very powerful.
Safe resource usage. Resources (memory and handles mostly) cannot escape their local scope. Compare to strict I/O, where it's easier to create space leaks by not cleaning up.
Efficient. Iteratees can be highly efficient; competitive with or better than both lazy I/O and strict I/O.
I have found that iteratees provide the greatest benefits when working with single logical data that comes from multiple sources. This is when the composability is most helpful, and resource management with strict I/O most annoying (e.g. nested allocas or brackets).
For an example, in a work-in-progress audio editor, a single logical chunk of sound data is a set of offsets into multiple audio files. I can process that single chunk of sound by doing something like this (from memory, but I think this is right):
enumSound :: MonadIO m => Sound -> Enumerator s m a
enumSound snd = foldr (>=>) enumEof . map enumFile $ sndFiles snd
This seems clear, concise, and elegant to me, much more so than the equivalent strict I/O. Iteratees are also powerful enough to incorporate any processing I want to do, including writing output, so I find this very nice. If I used lazy I/O I could get something as elegant, but the extra care to make sure resources are consumed and GC'd would outweigh the advantages IMO.
I also like that you need to explicitly retain data in iteratees, which avoids the notorious mean xs = sum xs / length xs space leak.
Of course, I don't use iteratees for everything. As an alternative I really like the with* idiom, but when you have multiple resources that need to be nested that gets complex very quickly.

Essentially, it's about doing IO in a functional style, correctly and efficiently. That's all, really.
Correct and efficient are easy enough using quasi-imperative style with strict IO. Functional style is easy with lazy IO, but it's technically cheating (using unsafeInterleaveIO under the hood) and can have issues with resource management and efficiency.
In very, very general terms, a lot of pure functional code follows a pattern of taking some data, recursively expanding it into smaller pieces, transforming the pieces in some fashion, then recombining it into a final result. The structure may be implicit (in the call graph of the program) or an explicit data structure being traversed.
But this falls apart when IO is involved. Say your initial data is a file handle, the "recursively expand" step is reading a line from it, and you can't read the entire file into memory at once. This forces the entire read-transform-recombine process to be done for each line before reading the next one, so instead of the clean "unfold, map, fold" structure they get mashed together into explicitly recursive monadic functions using strict IO.
Iteratees provide an alternative structure to solve the same problem. The "transform and recombine" steps are extracted and, instead of being functions, are changed into a data structure representing the current state of the computation. The "recursively expand" step is given the responsibility of obtaining the data and feeding it to an (otherwise passive) iteratee.
What benefits does this offer? Among other things:
Because an iteratee is a passive object that performs single steps of a computation, they can be easily composed in different ways--for instance, interleaving two iteratees instead of running them sequentially.
The interface between iteratees and enumerators is pure, just a stream of values being processed, so a pure function can be freely spliced in between them.
Data sources and computations are oblivious to each other's internal workings, decoupling input and resource management from processing and output.
The end result is that a program can have a high-level structure much closer to what a pure functional version would look like, with many of the same benefits to compositionality, while simultaneously having efficiency comparable to the more imperative, strict IO version.
As for being "worth the complexity"? Well, that's the thing--they're really not that complex, just a bit new and unfamiliar. The idea's been floating around for only, what, a couple years? Give it some time for things to shake out as people use iteratee-based IO in larger projects (e.g., with things like Snap), and for more examples/tutorials to appear. It's likely that, in hindsight, the current implementations will seem very rough around the edges.
Somewhat related: You may want to read this discussion about functional-style IO. Iteratees aren't mentioned all that much, but the central issue is very similar. In particular this solution, which is both very elegant and goes even further than iteratees in abstracting incremental IO.

under what circumstances their benefits justify their complexity
Every language has strict (classical) IO, where all resources are managed by the user. Haskell also provides ubiquitous lazy IO, where all resource management is delegated to the system.
However, that can create problems, as the scope of resources is dependent on runtime demand properties.
Iteratees strike a third way:
High level abstractions, like lazy IO.
Explicit, lexical scoping of resources, like strict IO.
It is justified when you have complex IO processing tasks, but very tight bounds on resource use. An example is a web server.
Indeed, Snap is built around iteratee IO on top of epoll.

Related

How do experienced Haskell developers approach laziness at design time?

I'm an intermediate Haskell programmer with tons of experience in strict FP and non-FP languages. Most of my Haskell code analyzes moderately large datasets (10^6..10^9 things), so laziness is always lurking. I have a reasonably good understanding of thunks, WHNF, pattern matching, and sharing, and I've been able to fix leaks with bang patterns and seq, but this profile-and-pray approach feels sordid and wrong.
I want to know how experienced Haskell programmers approach laziness at design time. I'm not asking about easy items like Data.ByteString.Lazy or foldl'; rather, I want to know how you think about the lower-level lazy machinery that causes runtime memory problems and tricky debugging.
How do you think about thunks, pattern matching, and sharing during design time?
What design patterns and idioms do you use to avoid leaks?
How did you learn these patterns and idioms, and do you have some good refs?
How do you avoid premature optimization of non-leaking non-problems?
(Amended 2014-05-15 for time budgeting):
Do you budget substantial project time for finding and fixing memory problems?
Or, do your design skills typically circumvent memory problems, and you get the expected memory consumption very early in the development cycle?

I think most of the trouble with "strictness leaks" happens because people don't have a good conceptual model. Haskellers without a good conceptual model tend to have and propagate the superstition that stricter is better. Perhaps this intuition comes from their results from toying with small examples & tight loops. But it is incorrect. It's just as important to be lazy at the right times as to be strict at the right times.
There are two camps of data types, usually referred to as "data" and "codata". It is essential to respect the patterns of each one.
Operations which produce "data" (Int, ByteString, ...) must be forced close to where they occur. If I add a number to an accumulator, I am careful to make sure that it will be forced before I add another one. A good understanding of laziness is very important here, especially its conditional nature (i.e. strictness propositions don't take the form "X gets evaluated" but rather "when Y is evaluated, so is X").
Operations which produce and consume "codata" (lists most of the time, trees, most other recursive types) must do so incrementally. Usually codata -> codata transformation should produce some information for each bit of information they consume (modulo skipping like filter). Another important piece for codata is that you use it linearly whenever possible -- i.e. use the tail of a list exactly once; use each branch of a tree exactly once. This ensures that the GC can collect pieces as they are consumed.
Things take a special amount of care when you have codata that contains data. E.g. iterate (+1) 0 !! 1000 will end up producing a size-1000 thunk before evaluating it. You need to think about conditional strictness again -- the way to prevent this case is to ensure that when a cons of the list is consumed, the addition of its element occurs. iterate violates this, so we need a better version.
iterate' :: (a -> a) -> a -> [a]
iterate' f x = x : (x `seq` iterate' f (f x))
As you start composing things, of course it gets harder to tell when bad cases happen. In general it is hard to make efficient data structures / functions that work equally well on data and codata, and it's important to keep in mind which is which (even in a polymorphic setting where it's not guaranteed, you should have one in mind and try to respect it).
Sharing is tricky, and I think I approach it mostly on a case-by-case basis. Because it's tricky, I try to keep it localized, choosing not to expose large data structures to module users in general. This can usually be done by exposing combinators for generating the thing in question, and then producing and consuming it all in one go (the codensity transformation on monads is an example of this).
My design goal is to get every function to be respectful of the data / codata patterns of my types. I can usually hit it (though sometimes it requires some heavy thought -- it has become natural over the years), and I seldom have leak problems when I do. But I don't claim that it's easy -- it requires experience with the canonical libraries and patterns of the language. These decisions are not made in isolation, and everything has to be right at once for it to work well. One poorly tuned instrument can ruin the whole concert (which is why "optimization by random perturbation" almost never works for these kinds of issues).
Apfelmus's Space Invariants article is helpful for developing your space/thunk intuition further. Also see Edward Kmett's comment below.

Lock-free programming in Haskell

Does anyone know if it is possible to do lock-free programming in Haskell? I'm interested both in the question of whether the appropriate low-level primitives are available, and (if they are) on any information on what works in terms of using these to build working larger-scale systems in the pure functional context. (I've never done lock-free programming in a pure functional context before.) For instance, as I understand it the Control.Concurrent.Chan channels are built on top of MVars, which (as I understand things) use locks---could one in principle build versions of the Chan primitive which are lock free internally? How much performance gain might one hope to get?
I shoudl also say that I'm familiar with the existence of TVars, but don't understand their internal implementation---I've been given to understand that they are mostly lock free, but I'm not sure if they're entirely lock free. So any information on the internal implementation of TVars would also be helpful!
(This thread provides some discussion, but I wonder if there's anything more up to date/more comprehensive.)

Not only does an MVar use locks, it is a lock abstraction. And, as I recall, individual STM primitives are optimistic, but there are locks used in various places in the STM implementation. Just remember the handy rhyme: "If it can block, then beware of locks".
For real lock-free programming you want to use IORefs directly, along with atomicModifyIORef.
Edit: regarding black holes, as I recall the implementation is lock free, but I can't vouch for the details. The mechanism is described in "Runtime Support for Multicore Haskell": http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/multicore-ghc.pdf
But that implementation underwent some tweaks, I think, as described in Simon Marlow's 2010 Haskell Implementors Workshop talk "Scheduling Lazy Evaluation on Multicore": http://haskell.org/haskellwiki/HaskellImplementorsWorkshop/2010. The slides are unfortunately offline, but the video should still work.

Lock free programming is trivial in Haskell. The easiest way to have a shared piece of data that needs to be modified by many threads is to start with any normal haskell type (list, Map, Maybe, whatever you need), and place it in an IORef. Once you've done this, you have the ability to use atomicModifyIORef to perform modifications in place, which are guaranteed to take next to no time.
type MyDataStructure = [Int]
type ConcMyData = IORef MyDataStructure
main = do
sharedData <- newIORef []
...
atomicModifyIORef sharedData (\xs -> (1:xs,()))
The reason this works is that a pointer to the think that will eventually evaluate the result inside the IORef is stored, and whenever a thread reads from the IORef, they get the thunk, and evaluate as much of the structure as it needs. Since all threads could read this same thunk, it will only be evaluated once (and if it's evaluated more than once, it's guaranteed to always end up with the same result, so concurrent evaluations are ok). I believe this is correct, I'm happy to be corrected though.
The take home message from this is that this sort of abstraction is only easily implemented in a pure language, where the value of things never change (except of course when they do, with types like IORef, MVars, and the STM types). The copy on write nature of Haskell's data structures means that modified structures can share a lot of data with the original structure, while only allocating anything that's new to the structure.
I don't think i've done a very good explaining how this works, but I'll come back tomorrow and clarify my answer.
For more information, see the slides for the talk Multicore programming in Haskell by Simon Marlow of Microsoft Research (and one of the main GHC implementors).

Look into stm, specifically its TChan type.

Departmental restriction against unsafePerformIO

There has been some talk at work about making it a department-wide policy of prohibiting the use of unsafePerformIO and its ilk. Personally, I don't really mind as I've always maintained that if I found myself wanting to use it, it usually meant that I need to rethink my approach.
Does this restriction sound reasonable? I seem to remember reading somewhere that it was included mainly for FFI, but I can't remember where I read that at the moment.
edit:
Ok, that's my fault. It wouldn't be restricted where it's reasonably needed, ie. FFI. The point of the policy is more to discourage laziness and code smells.

A lot of core libraries like ByteString use unsafePerformIO under the hood, for example to customize memory allocation.
When you use such a library, you're trusting that the library author has proven the referential transparency of their exported API, and that any necessary preconditions for the user are documented. Rather than a blanket ban, your department should establish a policy and a review process for making similar assurances internally.

Well, there are valid uses for unsafePerformIO. It's not there just to be decorative, or as a temptation to test your virtue. None of those uses, however, involve adding meaningful side effects to everyday code. Here's a few examples of uses that can potentially be justified, with various degrees of suspicion:
Wrapping a function that's impure internally, but has no externally observable side effects. This is the same basic idea as the ST monad, except that here the burden is on the programmer to show that the impurity doesn't "leak".
Disguising a function that's deliberately impure in some restricted way. For instance, write-only impurity looks the same as total purity "from the inside", since there's no way to observe the output that's produced. This can be useful for some kinds of logging or debugging, where you explicitly don't want the consistency and well-defined ordering required by the IO monad. An example of this is Debug.Trace.trace, which I sometimes refer to as unsafePerformPrintfDebugging.
Introspection on pure computations, producing a pure result. A classic example is something like the unambiguous choice operator, which can run two equivalent pure functions in parallel in order to get an answer quicker.
Internally unobservable breaking of referential transparency, such as introducing nondeterminism when initializing data. As long as each impure function is evaluated only once, referential transparency will be effectively preserved during any single run of the program, even if the same faux-pure function called with the same arguments gives different results on different runs.
The important thing to note about all of the above is that the resulting impurity is carefully controlled and limited in scope. Given a more fine-grained system of controlling side-effects than the all-purpose IO monad, these would all be obvious candidates for slicing off bits of semi-purity, much like the controlled mutable state in the aforementioned ST monad.
Post scriptum: If a hard-line stance against any non-required use of unsafePerformIO is being considered, I strongly encourage
extending the prohibition to include unsafeInterleaveIO and any functions that allow observation of its behavior. It's at least as sketchy as some of the unsafePerformIO examples I listed above, if you ask me.

unsafePerformIO is the runST of the IO monad. It is sometimes essential. However, unlike runST, the compiler cannot check that you are preserving referential transparency.
So if you use it, the programmer has a burden to explain why the use is safe. It shouldn't be banned, it should be accompanied with evidence.

Outlawing unsafePerformIO in "application" code is an excellent idea. In my opinion there is no excuse for unsafePerformIO in normal code and in my experience it is not needed. It is really not part of the language so you are not really programming in Haskell any more if you use it. How do you know what it even means?
On the other hand, using unsafePerformIO in an FFI binding is reasonable if you know what you are doing.

Outlawing unsafePerformIO is a terrible idea, because it effectively locks code into the IO monad: for example, a c library binding will almost always be in the IO monad - however, using unsafePerformIO a higher-level purely functional library can be built on top of it.
Arguably, unsafePerformIO reflects the compromise between the highly stateful model of the personal computer and the pure, stateless model of haskell; even a function call is a stateful from the computer's point of view since it requires pushing arguments onto a stack, messing with registers, etc., but the usage is based on the knowledge that these operations do in fact compose functionally.

Will I develop good/bad habits because of lazy evaluation?

I'm looking to learn functional programming with either Haskell or F#.
Are there any programming habits (good or bad) that could form as a result Haskell's lazy evaluation? I like the idea of Haskell's functional programming purity for the purposes of understanding functional programming. I'm just a bit worried about two things:
I may misinterpret lazy-evaluation-based features as being part of the "functional paradigm".
I may develop thought patterns that work in a lazy world but not in a normal order/eager evaluation world.

There are habits that you get into when programming in a lazy language that don't work in a strict language. Some of these seem so natural to Haskell programmers that they don't think of them as lazy evaluation. A couple of examples off the top of my head:
f x y = if x > y then .. a .. b .. else c
where
a = expensive
b = expensive
c = expensive
here we define a bunch of subexpressions in a where clause, with complete disregard for which of them will ever be evaluated. It doesn't matter: the compiler will ensure that no unnecessary work is performed at runtime. Non-strict semantics means that the compiler is able to do this. Whenever I write in a strict language I trip over this a lot.
Another example that springs to mind is "numbering things":
pairs = zip xs [1..]
here we just want to associate each element in a list with its index, and zipping with the infinite list [1..] is the natural way to do it in Haskell. How do you write this without an infinite list? Well, the fold isn't too readable
pairs = foldr (\x xs -> \n -> (x,n) : xs (n+1)) (const []) xs 1
or you could write it with explicit recursion (too verbose, doesn't fuse). There are several other ways to write it, none of which are as simple and clear as the zip.
I'm sure there are many more. Laziness is surprisingly useful, when you get used to it.

You'll certainly learn about evaluation strategies. Non-strict evaluation strategies can be very powerful for particular kinds of programming problems, and once you're exposed to them, you may be frustrated that you can't use them in some language setting.
I may develop thought patterns that work in a lazy world but not in a normal order/eager evaluation world.
Right. You'll be a more rounded programmer. Abstractions that provide "delaying" mechanisms are fairly common now, so you'd be a worse programmer not to know them.

I may misinterpret lazy-evaluation-based features as being part of the "functional paradigm".
Lazy evaluation is an important part of the functional paradigm. It's not a requirement - you can program functionally with eager evaluation - but it's a tool that naturally fits functional programming.
You see people explicitly implement/invoke it (notably in the form of lazy sequences) in languages that don't make it the default; and while mixing it with imperative code requires caution, pure functional code allows safe use of laziness. And since laziness makes many constructs cleaner and more natural, it's a great fit!
(Disclaimer: no Haskell or F# experience)

To expand on Beni's answer: if we ignore operational aspects in terms of efficiency (and stick with a purely functional world for the moment), every terminating expression under eager evaluation is also terminating under non-strict evaluation, and the values of both (their denotations) coincide.
This is to say that lazy evaluation is strictly more expressive than eager evaluation. By allowing you to write more correct and useful expressions, it expands your "vocabulary" and ability to think functionally.
Here's one example of why:
A language can be lazy-by-default but with optional eagerness, or eager by default with optional laziness, but in fact its been shown (c.f. Okasaki for example) that there are certain purely functional data structures which can only achieve certain orders of performance if implemented in a language that provides laziness either optionally or by default.
Now when you do want to worry about efficiency, then the difference does matter, and sometimes you will want to be strict and sometimes you won't.
But worrying about strictness is a good thing, because very often the cleanest thing to do (and not only in a lazy-by-default language) is to use a thoughtful mix of lazy and eager evaluation, and thinking along these lines will be a good thing no matter which language you wind up using in the future.
Edit: Inspired by Simon's post, one additional point: many problems are most naturally thought about as traversals of infinite structures rather than basically recursive or iterative. (Although such traversals themselves will generally involve some sort of recursive call.) Even for finite structures, very often you only want to explore a small portion of a potentially large tree. Generally speaking, non-strict evaluation allows you to stop mixing up the operational issue of what the processor actually bothers to figure out with the semantic issue of the most natural way to represent the actual structure you're using.

Recently, i found myself doing Haskell-style programming in Python. I took over a monolithic function that extracted/computed/generated values and put them in a file sink, in one step.
I thought this was bad for understanding, reuse and testing. My plan was to separate value generation and value processing. In Haskell i would have generated a (lazy) list of those computed values in a pure function and would have done the post-processing in another (side-effect bearing) function.
Knowing that non-lazy lists in Python can be expensive, if they tend to get big, i thought about the next close Python solution. To me that was to use a generator for the value generation step.
The Python code got much better thanks to my lazy (pun intended) mindset.

I'd expect bad habits.
I saw one of my coworkers try to use (hand-coded) lazy evaluation in our .NET project. Unfortunately the consequence of lazy evaluation hid the bug where it would try remote invocations before the start of main executed, and thus outside the try/catch to handle the "Hey I can't connect to the internet" case.
Basically, the manner of something was hiding the fact that something really expensive was hiding behind a property read and so made it look like a good idea to do inside the type initializer.

Contextual information missing.
Laziness (or more specifically, the assumption of the availabilty of the purity and equational reasoning) is sometimes quite useful for specific problem domains, but not necessarily better in general. If you're talking about general-purpose language settings, relying on the lazy evaluation rules by default is considered harmful.
Analysis
Any languages has functional combination (or the applicable terms combination; i.e. function call expression, function-like macro invocation, FEXPRs, etc.) enforces rules on evaluation, implying the order of different parts of subcomputation therein. For convenience and the simplicity of the specification of the language, a language usually specify the rules in a flavor paired to the reduction strategy:
The strict evaluation, or the applicative-order reduction, which evaluates all subexpression first, before the subcomputation of the remaining evaluation of the hole combination.
The non-strict evaluation, or the normal-order reduction, which does not necessarily evaluate every subexpression at first.
The remaining subcomputation finally determines the result of the whole evaluation of the expression. (For program-defined constructs, this usually implies the substitution of the evaluated argument into something like a function body, and the subsequent evaluation of the result.)
Lazy evaluation, or the call-by-need strategy, is a typical concrete instance of the non-strict evaluation kind. To make it practically usable, subexpression evaluations are required to be pure (side-effect-free), so the reductions implementing the strategy can have the Church-Rosser property whatever the order of subexpression evaluation is actually adopted.
One significant merit of such design is the availability of the equational resoning: users can encode the equality of expression evaluation in the program, and optimizing implementation of the language can perform the transformation depending directly on such constructs.
However, there are many serious problems behind such design.
Equational reasoning is not important as it in the first glance in practice.
The encoding is not a separate feature. It has some specific requirements on the other features to carry the encoding. For a pure language, it is even more difficult to encode them elsewhere, so there is certain pressure to make the type system more expressive, hence more complicated typing and typechecking.
Whether the compiler uses the equational reasoning directly encoded in the program or not is an implementation detail. It is more of a taste of style to promote the importance.
Syntatic equations are not powerful enough to encode semantic conditions like cases of "unspecified behavior" in ISO C. It still needs some additional primitives to express non-determinism of such semantic equivalence classes to make optimization techniques based on such equivalence possible.
It is computationally inefficient at the very basic level by default, and not amendable by the programmer easily.
There is no systemic way to reduce the cost on equations which are known not required by the programmer.
One of the significance comes from the clash between lazily evaluated combinations and proper tail recursion over the combinations.
The unpredictable abuse of thunks to memoize the lazily evaluated expressions also makes troubles on the utilization of the machine resources (e.g. registers and the cache memory).
Purely functional languages like Haskell may declare the referential transparency is a good thingTM. However, this is faulty in certain contexts.
There are semantic gaps over the terminology itself. The purity is not the only aspect for the referential transparency; moreover, there are other kinds of such property not readily provided by the evaluation strategy.
In general, referential transparency should not be a goal about programming. Instead, it is an optional manner to implement the composable components of programs. Composability is essentially about the expected invariance on the interface of the components. There are many ways to keep the composability without the aid of any kinds of referential transparency. Whether the guarantee should be enforced by the language rules? It depends. At least, it should not depend totally on the language designers' point.
The lack of impure evaluations requires more syntax noises to encode many constructs simply expressible by mutable state cells in the traditional impure languages. The workarounds of the practical problems do make the solution more difficult and hard to reason by humans.
For example, I/O operations are side-effectful, thus not directly expressible in Haskell expressions under the usual non-strict evaluation rules, otherwise the order of effects will be non-deterministic.
To overcoming the shortcoming, some indirect conventional constructs like the IO monad to simulate the traditional imperative style are proposed. Such monadic constructs are in essential "indirect" in the sense similar to the continuation-passing style, which is considerably low-level and difficult to read. Even though monads can be "powerful" than continuations in expresiveness, it does not naturally powerful than more high-level alternatives (like algebraic effect systems) when the lazy evaluation strategy is not enforced by default.
Besides the intuition problem above, the necessity of using monadic constructs are often difficult to prove formally (if ever possible). As the result, they are very easily abused (just like the design patterns for "OOP" languages derived from Simula). The related syntax sugar, notably, the famous do-notation, is abused for a few decades before well-known by the Haskell community.
Simulating strict language constructs in languages like Haskell usually needs monadic constructs, while simulating non-strict constructs in strict languages are considerably simpler and easier to implement efficiently. For instance, there is SRFI-45.
The lazy evaluation strategy does not deal with many other non-strict constructs well.
For example, seq has to be a compiler magic in GHC. This is not easily expressible by other Haskell constructs without massive changes in the core Haskell language rules.
Although traditional strict languages also do not allow user programs to simulate the enforcement of the order easily so such sequential constructs are therefore primitive (examples: C-like ; is primitive; the derivation of Scheme's begin is relying on the primitive lambda which in turn implying an implicit evaluation order on expressions), it can be implementable reusing the applicative order rules without additional ad-hoc primitives, like the derivation of the$sequence operator in the Kernel language.
Concerns about specific questions
Lazy evaluation is not a must for the "functional paradigm", though as mentioned above, purely functional languages are likely have the lazy evaluation strategy by default. The common properties are the usability of first-class functions. Impure languages like Lisp and ML family are considered "functional", which use eager evaluation by default. Also note the popularity of "functional paradigm" came after the introducing of function-level programming. The latter is quite different, but still somewhat similar to "functional programming" on the treatment of first-classness.
As mentioned above, the way to simulate laziness in eager languages are well-known. Additionally, for pure programs, there may be no non-trivially semantic difference between call-by-need and normal order reduction. To figure out something really only work in a lazy world is actually not easy. (Do you want to implement the language?) Just go ahead.
Conclusion
Be careful to the problem domain. Lazy evaluation may work well for specific scenarios. However, making it by default is likely to be a bad idea in general, because users (whoever to use the language to program, or to derive a new dialect based on the current language) will likely have few chances to ignore all of the problems it will cause.

Well, try to think of something that would work if lazily evaluated, that wouldn't if eagerly evaluated. The most common category of these would be lazy logical operator evaluation used to hide a "side effect". I'll use C#-ish language to explain, but functional languages would have similar analogs.
Take the simple C# lambda:
(a,b) => a==0 || ++b < 20
In a lazy-evaluated language, if a==0, the expression ++b < 20 is not evaluated (because the entire expression evaluates to true either way), which means that b is not incremented. In both imperative and functional languages, this behavior (and similar behavior of the AND operator) can be used to "hide" logic containing side effects that should not be executed:
(a,b) => a==0 && save(b)
"a" in this case may be the number of validation errors. If there were validation errors, the first half fails and the second half is not evaluated. If there were no validation errors, the second half is evaluated (which would include the side effect of trying to save b) and the result (apparently true or false) is returned to be evaluated. If either side evaluates to false, the lambda returns false indicating that b was not successfully saved. If this were evaluated "eagerly", we would try to save regardless of the value of "a", which would probably be bad if a nonzero "a" indicated that we shouldn't.
Side effects in functional languages are generally considered a no-no. However, there are few non-trivial programs that do not require at least one side effect; there's generally no other way to make a functional algorithm integrate with non-functional code, or with peripherals like a data store, display, network channel, etc.

What functional language techniques can be used in imperative languages?

Which techniques or paradigms normally associated with functional languages can productively be used in imperative languages as well?
e.g.:
Recursion can be problematic in languages without tail-call optimization, limiting its use to a narrow set of cases, so that's of limited usefulness
Map and filter have found their way into non-functional languages, even though they have a functional sort of feel to them
I happen to really like not having to worry about state in functional languages. If I were particularly stubborn I might write C programs without modifying variables, only encapsulating my state in variables passed to functions and in values returned from functions.
Even though functions aren't first class values, I can wrap one in an object in Java say, and pass that into another method. Like Functional programming, just less fun.
So, for veterans of functional programming, when you program in imperative languages, what ideas from FP have you applied successfully?

Pretty nearly all of them?
If you understand functional languages, you can write imperative programs that are "informed" by a functional style. That will lead you away from side effects, and toward programs in which reading the program text at any particular point is sufficient to let you really know what the meaning of the program is at that point.
Back at the Dawn of Time we used to worry about "coupling" and "cohesion". Learning an FP will lead you to write systems with optimal (minimal) coupling, and high cohesion.

Here are things that get in the way of doing FP in a non-FP language:
If the language doesn't support lambda/closures, and doesn't have any syntactic sugar to easily mostly hack it, you are dead in the water. You don't call map/filter without closures.
If the language is statically-typed and doesn't support generics, you are dead in the water. All the good FP stuff uses genericity.
If the language doesn't support tail-recursion, you are hindered. You can write implementations of e.g. 'map' iteratively; also often your data may not be too large and recursion will be ok.
If the language does not support algebraic data types and pattern-matching, you will be mildly hindered. It's just annoying not to have them once you've tasted them.
If the language cannot express type classes, well, oh well... you'll get by, but darn if that's not just the awesomest feature ever, but Haskell is the only remotely popular language with good support.

Not having first-class functions really puts a damper on writing functional programs, but there are a few things that you can do that don't require them. The first is to eschew mutable state - try to have most or all of your classes return new objects that represent the modified state instead of making the change internally. As an example, if you were writing a linked list with an add operation, you would want to return the new linked list from add as opposed to modifying the object.
While this may make your programs less efficient (due to the increased number of objects being created and destroyed) you will gain the ability to more easily debug the program because the state and operation of the objects becomes more predictable, not to mention the ability to nest function calls more deeply because they have state inputs and outputs.

I've successfully used higher-order functions a lot, especially the kind that are passed in rather than the kind that are returned. The kind that are returned can be a bit tedious but can be simulated.
All sorts of applicative data structures and recursive functions work well in imperative languages.
The things I miss the most:
Almost no imperative languages guarantee to optimize every tail call.
I know of no imperative language that supports case analysis by pattern matching.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string