Why don't lazy languages support mutation?

Why don't lazy languages support mutation? - programming-languages

I'm studying programming language theory and I can't figure out a solid reason why lazy languages don't have mutation. Anyone know the reason?

Laziness means that a function is not actually evaluated until (or unless) its return value is used. This means that function calls aren't necessarily evaluated in the order in which they appear in the code. It also means that there can't be void functions because they would never be evaluated (as its not possible to use a return value that does not exist).
However for functions that perform side effects (like mutation, but also just printing to the screen) it does matter in which order they're executed. It matters even more that they're executed at all. This means that lazy languages need a way to emulate side effects in special types that ensure that they are executed and executed in the right order.
Since entirely side effect-free programs are useless (you need to be able to print to the screen at all), lazy languages actually do support side effects. They just encapsulate them with the IO monad or uniqueness types. As an example haskell does have mutable arrays, but they can be only used inside the IO monad.

Mutation means you cannot be sure of the state of the program at any time and will have to worry about side effects from any action. I've actually thought about it and I can't think of any way to have a completely lazy language that supports mutation. (I am no computer scientist though.)

Related

What are Haskell's strictness points?

We all know (or should know) that Haskell is lazy by default. Nothing is evaluated until it must be evaluated. So when must something be evaluated? There are points where Haskell must be strict. I call these "strictness points", although this particular term isn't as widespread as I had thought. According to me:
Reduction (or evaluation) in Haskell only occurs at strictness points.
So the question is: what, precisely, are Haskell's strictness points? My intuition says that main, seq / bang patterns, pattern matching, and any IO action performed via main are the primary strictness points, but I don't really know why I know that.
(Also, if they're not called "strictness points", what are they called?)
I imagine a good answer will include some discussion about WHNF and so on. I also imagine it might touch on lambda calculus.
Edit: additional thoughts about this question.
As I've reflected on this question, I think it would be clearer to add something to the definition of a strictness point. Strictness points can have varying contexts and varying depth (or strictness). Falling back to my definition that "reduction in Haskell only occurs at strictness points", let us add to that definition this clause: "a strictness point is only triggered when its surrounding context is evaluated or reduced."
So, let me try to get you started on the kind of answer I want. main is a strictness point. It is specially designated as the primary strictness point of its context: the program. When the program (main's context) is evaluated, the strictness point of main is activated. Main's depth is maximal: it must be fully evaluated. Main is usually composed of IO actions, which are also strictness points, whose context is main.
Now you try: discuss seq and pattern matching in these terms. Explain the nuances of function application: how is it strict? How is it not? What about deepseq? let and case statements? unsafePerformIO? Debug.Trace? Top-level definitions? Strict data types? Bang patterns? Etc. How many of these items can be described in terms of just seq or pattern matching?

A good place to start is by understanding this paper: A Natural Semantics for Lazy Evalution (Launchbury). That will tell you when expressions are evaluated for a small language similar to GHC's Core. Then the remaining question is how to map full Haskell to Core, and most of that translation is given by the Haskell report itself. In GHC we call this process "desugaring", because it removes syntactic sugar.
Well, that's not the whole story, because GHC includes a whole raft of optimisations between desugaring and code generation, and many of these transformations will rearrange the Core so that things get evaluated at different times (strictness analysis in particular will cause things to be evaluated earlier). So to really understand how your
program will be evaluated, you need to look at the Core produced by GHC.
Perhaps this answer seems a bit abstract to you (I didn't specifically mention bang patterns or seq), but you asked for something precise, and this is about the best we can do.

I would probably recast this question as, Under what circumstances will Haskell evaluate an expression? (Perhaps tack on a "to weak head normal form.")
To a first approximation, we can specify this as follows:
Executing IO actions will evaluate any expressions that they “need.” (So you need to know if the IO action is executed, e.g. it's name is main, or it is called from main AND you need to know what the action needs.)
An expression that is being evaluated (hey, that's a recursive definition!) will evaluate any expressions it needs.
From your intuitive list, main and IO actions fall into the first category, and seq and pattern matching fall into the second category. But I think that the first category is more in line with your idea of "strictness point", because that is in fact how we cause evaluation in Haskell to become observable effects for users.
Giving all of the details specifically is a large task, since Haskell is a large language. It's also quite subtle, because Concurrent Haskell may evaluate things speculatively, even though we end up not using the result in the end: this is a third breed of things that cause evaluation. The second category is quite well studied: you want to look at the strictness of the functions involved. The first category too can be thought to be a sort of "strictness", though this is a little dodgy because evaluate x and seq x $ return () are actually different things! You can treat it properly if you give some sort of semantics to the IO monad (explicitly passing a RealWorld# token works for simple cases), but I don't know if there's a name for this sort of stratified strictness analysis in general.

C has the concept of sequence points, which are guarantees for particular operations that one operand will be evaluated before the other. I think that's the closest existing concept, but the essentially equivalent term strictness point (or possibly force point) is more in line with Haskell thinking.
In practice Haskell is not a purely lazy language: for instance pattern matching is usually strict (So trying a pattern match forces evaluation to happen at least far enough to accept or reject the match.
…
Programmers can also use the seq primitive to force an expression to evaluate regardless of whether the result will ever be used.
$! is defined in terms of seq.
—Lazy vs. non-strict.
So your thinking about !/$! and seq is essentially right, but pattern matching is subject to subtler rules. You can always use ~ to force lazy pattern matching, of course. An interesting point from that same article:
The strictness analyzer also looks for cases where sub-expressions are always required by the outer expression, and converts those into eager evaluation. It can do this because the semantics (in terms of "bottom") don't change.
Let's continue down the rabbit hole and look at the docs for optimisations performed by GHC:
Strictness analysis is a process by which GHC attempts to determine, at compile-time, which data definitely will 'always be needed'. GHC can then build code to just calculate such data, rather than the normal (higher overhead) process for storing up the calculation and executing it later.
—GHC Optimisations: Strictness Analysis.
In other words, strict code may be generated anywhere as an optimisation, because creating thunks is unnecessarily expensive when the data will always be needed (and/or may only be used once).
…no more evaluation can be performed on the value; it is said to be in normal form. If we are at any of the intermediate steps so that we've performed at least some evaluation on a value, it is in weak head normal form (WHNF). (There is also a 'head normal form', but it's not used in Haskell.) Fully evaluating something in WHNF reduces it to something in normal form…
—Wikibooks Haskell: Laziness
(A term is in head normal form if there is no beta-redex in head position1. A redex is a head redex if it is preceded only by lambda abstractors of non-redexes 2.) So when you start to force a thunk, you're working in WHNF; when there are no more thunks left to force, you're in normal form. Another interesting point:
…if at some point we needed to, say, print z out to the user, we'd need to fully evaluate it…
Which naturally implies that, indeed, any IO action performed from main does force evaluation, which should be obvious considering that Haskell programs do, in fact, do things. Anything that needs to go through the sequence defined in main must be in normal form and is therefore subject to strict evaluation.
C. A. McCann got it right in the comments, though: the only thing that's special about main is that main is defined as special; pattern matching on the constructor is sufficient to ensure the sequence imposed by the IO monad. In that respect only seq and pattern-matching are fundamental.

Haskell is AFAIK not a pure lazy language, but rather a non-strict language. This means that it does not necessarily evaluate terms at the last possible moment.
A good source for haskell's model of "lazyness" can be found here: http://en.wikibooks.org/wiki/Haskell/Laziness
Basically, it is important to understand the difference between a thunk and the weak header normal form WHNF.
My understanding is that haskell pulls computations through backwards as compared to imperative languages. What this means is that in the absence of "seq" and bang patterns, it will ultimately be some kind of side effect that forces the evaluation of a thunk, which may cause prior evaluations in turn (true lazyness).
As this would lead to a horrible space leak, the compiler then figures out how and when to evaluate thunks ahead of time to save space. The programmer can then support this process by giving strictness annotations (en.wikibooks.org/wiki/Haskell/Strictness , www.haskell.org/haskellwiki/Performance/Strictness) to further reduce space usage in form of nested thunks.
I am not an expert in the operational semantics of haskell, so I will just leave the link as a resource.
Some more resources:
http://www.haskell.org/haskellwiki/Performance/Laziness
http://www.haskell.org/haskellwiki/Haskell/Lazy_Evaluation

Lazy doesn't mean do nothing. Whenever your program pattern matches a case expression, it evaluates something -- just enough anyway. Otherwise it can't figure out which RHS to use. Don't see any case expressions in your code? Don't worry, the compiler is translating your code to a stripped down form of Haskell where they are hard to avoid using.
For a beginner, a basic rule of thumb is let is lazy, case is less lazy.

This is not a full answer aiming for karma, but just a piece of the puzzle -- to the extent that this is about semantics, bear in mind that there are multiple evaluation strategies that provide the same semantics. One good example here -- and the project also speaks to how we typically think of Haskell semantics -- was the Eager Haskell project, which radically altered evaluation strategies while maintaining the same semantics: http://csg.csail.mit.edu/pubs/haskell.html

The Glasgow Haskell compiler translates your code into a Lambda-calculus-like language called core. In this language, something is going to be evaluated, whenever you pattern match it by a case-statement. Thus if a function is called, the outermost constructor and only it (if there are no forced fields) is going to be evaluated. Anything else is canned in a thunk. (Thunks are introduced by let bindings).
Of course this is not exactly what happens in the real language. The compiler convert Haskell into Core in a very sophisticated way, making as many things as possibly lazy and anything that is always needed lazy. Additionally, there are unboxed values and tuples that are always strict.
If you try to evaluate a function by hand, you can basically think:
Try to evaluate the outermost constructor of the return.
If anything else is needed to get the result (but only if it's really needed) is also going to be evaluated. The order doesn't matters.
In case of IO you have to evaluate the results of all statements from the first to the last in that. This is a bit more complicated, since the IO monad does some tricks to force evaluation in a specific order.

We all know (or should know) that Haskell is lazy by default. Nothing is evaluated until it must be evaluated.
No.
Haskell is not a lazy language
Haskell is a language in which evaluation order doesn't matter because there are no side effects.
It's not quite true that evaluation order doesn't matter, because the language allows for infinite loops. If you aren't careful, it's possible to get stuck in a cul-de-sac where you evaluate a subexpression forever when a different evaluation order would have led to termination in finite time. So it's more accurate to say:
Haskell implementations must evaluate the program in a way that terminates if there is any evaluation order that terminates. Only if every possible evaluation order fails to terminate can the implementation fail to terminate.
This still leaves implementations with a huge freedom in how they evaluate the program.
A Haskell program is a single expression, namely let {all top-level bindings} in Main.main. Evaluation can be understood as a sequence of reduction (small-)steps which change the expression (which represents the current state of the executing program).
You can divide reduction steps into two categories: those that are provably necessary (provably will be part of any terminating sequence of reductions), and those that aren't. You can divide the provably necessary reductions somewhat vaguely into two subcategories: those that are "obviously" necessary, and those that require some nontrivial analysis to prove them necessary.
Performing only obviously necessary reductions is what's called "lazy evaluation". I don't know whether a purely lazy evaluating implementation of Haskell has ever existed. Hugs may have been one. GHC definitely isn't.
GHC performs reduction steps at compile time that aren't provably necessary; for example, it will replace 1+2::Int with 3::Int even if it can't prove that the result will be used.
GHC may also perform not-provably-necessary reductions at run time in some circumstances. For example, when generating code to evaluate f (x+y), if x and y are of type Int and their values will be known at run time, but f can't be proven to use its argument, there is no reason not to compute x+y before calling f. It uses less heap space and less code space and is probably faster even if the argument isn't used. However, I don't know whether GHC actually takes these sorts of optimization opportunities.
GHC definitely performs evaluation steps at run time that are proven necessary only by fairly complex cross-module analyses. This is extremely common and may represent the bulk of the evaluation of realistic programs. Lazy evaluation is a last-resort fallback evaluation strategy; it isn't what happens as a rule.
There was an "optimistic evaluation" branch of GHC that did much more speculative evaluation at run time. It was abandoned because of its complexity and the ongoing maintenance burden, not because it didn't perform well. If Haskell was as popular as Python or C++ then I'm sure there would be implementations with much more sophisticated runtime evaluation strategies, maintained by deep-pocketed corporations. Non-lazy evaluation isn't a change to the language, it's just an engineering challenge.
Reduction is driven by top-level I/O, and nothing else
You can model interaction with the outside world by means of special side-effectful reduction rules like: "If the current program is of the form getChar >>= <expr>, then get a character from standard input and reduce the program to <expr> applied to the character you got."
The entire goal of the run time system is to evaluate the program until it has one of these side-effecting forms, then do the side effect, then repeat until the program has some form that implies termination, such as return ().
There are no other rules about what is reduced when. There are only rules about what can reduce to what.
For example, the only rules for if expressions are that if True then <expr1> else <expr2> can be reduced to <expr1>, if False then <expr1> else <expr2> can be reduced to <expr2>, and if <exc> then <expr1> else <expr2>, where <exc> is an exceptional value, can be reduced to an exceptional value.
If the expression representing the current state of your program is an if expression, you have no choice but to perform reductions on the condition until it's True or False or <exc>, because that's the only way you'll ever get rid of the if expression and have any hope of reaching a state that matches one of the I/O rules. But the language specification doesn't tell you to do that in so many words.
These sorts of implicit ordering constraints are the only way that evaluation can be "forced" to happen. This is a frequent source of confusion for beginners. For example, people sometimes try to make foldl more strict by writing foldl (\x y -> x `seq` x+y) instead of foldl (+). This doesn't work, and nothing like it can ever work, because no expression can make itself evaluate. The evaluation can only "come from above". seq is not special in any way in this regard.
Reduction happens everywhere
Reduction (or evaluation) in Haskell only occurs at strictness points. [...] My intuition says that main, seq / bang patterns, pattern matching, and any IO action performed via main are the primary strictness points [...].
I don't see how to make sense of that statement. Every part of the program has some meaning, and that meaning is defined by reduction rules, so reduction happens everywhere.
To reduce a function application <expr1> <expr2>, you have to evaluate <expr1> until it has a form like (\x -> <expr1'>) or (getChar >>=) or something else that matches a rule. But for some reason function application doesn't tend to show up on lists of expressions that allegedly "force evaluation", while case always does.
You can see this misconception in a quote from the Haskell wiki, found in another answer:
In practice Haskell is not a purely lazy language: for instance pattern matching is usually strict
I don't understand what could qualify as a "purely lazy language" to whoever wrote that, except, perhaps, a language in which every program hangs because the runtime never does anything. If pattern matching is a feature of your language then you've got to actually do it at some point. To do it, you have to evaluate the scrutinee enough to determine whether it matches the pattern. That's the laziest way to match a pattern that is possible in principle.
~-prefixed patterns are often called "lazy" by programmers, but the language spec calls them "irrefutable". Their defining property is that they always match. Because they always match, you don't have to evaluate the scrutinee to determine whether they match or not, so a lazy implementation won't. The difference between regular and irrefutable patterns is what expressions they match, not what evaluation strategy you're supposed to use. The spec says nothing about evaluation strategies.
main is a strictness point. It is specially designated as the primary strictness point of its context: the program. When the program (main's context) is evaluated, the strictness point of main is activated. [...] Main is usually composed of IO actions, which are also strictness points, whose context is main.
I'm not convinced that any of that has any meaning.
Main's depth is maximal: it must be fully evaluated.
No, main only has to be evaluated "shallowly", to make I/O actions appear at the top level. main is the entire program, and the program isn't completely evaluated on every run because not all code is relevant to every run (in general).
discuss seq and pattern matching in these terms.
I already talked about pattern matching. seq can be defined by rules that are similar to case and application: for example, (\x -> <expr1>) `seq` <expr2> reduces to <expr2>. This "forces evaluation" in the same way that case and application do. WHNF is just a name for what these expressions "force evaluation" to.
Explain the nuances of function application: how is it strict? How is it not?
It's strict in its left expression just like case is strict in its scrutinee. It's also strict in the function body after substitution just like case is strict in the RHS of the selected alternative after substitution.
What about deepseq?
It's just a library function, not a builtin.
Incidentally, deepseq is semantically weird. It should take only one argument. I think that whoever invented it just blindly copied seq with no understanding of why seq needs two arguments. I count deepseq's name and specification as evidence that a poor understanding of Haskell evaluation is common even among experienced Haskell programmers.
let and case statements?
I talked about case. let, after desugaring and type checking, is just a way of writing an arbitrary expression graph in tree form. Here's a paper about it.
unsafePerformIO?
To an extent it can be defined by reduction rules. For example, case unsafePerformIO <expr> of <alts> reduces to unsafePerformIO (<expr> >>= \x -> return (case x of <alts>)), and at the top level only, unsafePerformIO <expr> reduces to <expr>.
This doesn't do any memoization. You could try to simulate memoization by rewriting every unsafePerformIO expression to explicitly memoize itself, and creating the associated IORefs... somewhere. But you could never reproduce GHC's memoization behavior, because it depends on unpredictable details of the optimization process, and because it isn't even type safe (as shown by the infamous polymorphic IORef example in the GHC documentation).
Debug.Trace?
Debug.Trace.trace is just a simple wrapper around unsafePerformIO.
Top-level definitions?
Top-level variable bindings are the same as nested let bindings. data, class, import, and such are a whole different ball game.
Strict data types? Bang patterns?
Just sugar for seq.

Eager evaluation/applicative order and lazy evaluation/normal order

As far as I know, eager evaluation/applicative order evaluates all arguments to a function before applying it, on the other hand, lazy evaluation/normal order evaluates the arguments only when needed.
So, what are the differences between the pair of terms eager evaluation and applicative order, and lazy evaluation and normal order?
Thanks.

Lazy evaluation evaluates a term at most once, while normal order would evaluate it as often as it appears. So for example if you have f(x) = x+x and you call it as f(g(42)) then g(42) is called once under lazy evaluation or applicative order, but twice under normal order.
Eager evaluation and applicative order are synonymous, at least when using the definition of applicative order found in Structure and Interpretation of Computer Programs, which seems to match yours. (Wikipedia defines applicative order a bit differently and has it as a special case of eager evaluation).

I'm reading SICP too, and I've been curious by the definition of normal order given by the authors. It seemed rather similar to Lazy evaluation to me, so I went looking for some more information regarding both.
I know this question was asked a long time ago, but I looked at the FAQ and found no mention of answering old questions, so I thought I'd leave what I've found here so other people could use it in the future.
This is what I've found, and I'm inclined to agree with those:
I would argue (as have others) that lazy evaluation and
NormalOrderEvaluation are two different things; the difference is
alluded to above. In lazy evaluation, evaluation of the argument is
deferred until it is needed, at which point the argument is evaluated
and its result saved (memoized). Further uses of the argument in the
function use the computed value. The C/C++ operators ||, &&, and ? :
are both examples of lazy evaluation. (Unless some newbie C/C++
programmer is daft enough to overload && or ||, in which case the
overloaded versions are evaluated in strict order; which is why the &&
and || operators should NEVER be overloaded in C++).
In other words, each argument is evaluated at most once, possibly not
at all.
NormalOrderEvaluation, on the other hand, re-evaluates the expression
each time it is used. Think of C macros, CallByName in languages which
support it, and the semantics of looping control structures, etc.
Normal-order evaluation can take much longer than applicative order
evaluation, and can cause side effects to happen more than once.
(Which is why, of course, statements with side effects generally ought
not be given as arguments to macros in C/C++)
If the argument is invariant and has no side effects, the only
difference between the two is performance. Indeed, in a purely
functional language, lazy eval can be viewed as an optimization of
normal-order evaluation. With side effects present, or expressions
which can return a different value when re-evaluated, the two have
different behavior; normal order eval in particular has a bad
reputation in procedural languages due to the difficulty of reasoning
about such programs without ReferentialTransparency
Should also be noted that strict-order evaluation (as well as lazy
evaluation) can be achieved in a language which supports normal-order
evaluation via explicit memoing. The opposite isn't true; it requires
passing in thunks, functions, or objects which can be called/messaged
in order to defer/repeat the evaluation.
And
Lazy evaluation combines normal-order evaluation and sharing:
• Never evaluate something until you have to (normal-order)
• Never evaluate something more than once (sharing)
http://c2.com/cgi/wiki?LazyEvaluation
http://cs.anu.edu.au/student/comp3610/lectures/Lazy.pdf

From Kevin Sookocheff's Normal, Applicative and Lazy Evaluation post (emphases, stylistic changes mine):
Lazy Evaluation
While normal-order evaluation may result in doing extra work by
requiring function arguments to be evaluated more than once,
applicative-order evaluation may result in programs that do not
terminate where their normal-order equivalents do. In practise, most
functional programming languages solve this problem using lazy
evaluation.
With lazy evalution, we delay function evaluation in a way that avoids
multiple evaluations of the same function — thus combining the
benefits of normal-order and applicative-order evaluation.
With lazy
evaluation, we evaluate a value when it is needed, and after
evaluation all copies of that expression are updated with the new
value. In effect, a parameter passed into a function is stored in a
single location in memory so that the parameter need only be evaluated
once. That is, we remember all of the locations where we a certain
argument will be used, and when we evaluate a function, we replace the
argument with the value.
As a result, with lazy evaluation, every
parameter is evaluated at most once.
This was too long to post as a comment beneath the question, and updating existing answers with it seemed inappropriate, hence this answer.

Departmental restriction against unsafePerformIO

There has been some talk at work about making it a department-wide policy of prohibiting the use of unsafePerformIO and its ilk. Personally, I don't really mind as I've always maintained that if I found myself wanting to use it, it usually meant that I need to rethink my approach.
Does this restriction sound reasonable? I seem to remember reading somewhere that it was included mainly for FFI, but I can't remember where I read that at the moment.
edit:
Ok, that's my fault. It wouldn't be restricted where it's reasonably needed, ie. FFI. The point of the policy is more to discourage laziness and code smells.

A lot of core libraries like ByteString use unsafePerformIO under the hood, for example to customize memory allocation.
When you use such a library, you're trusting that the library author has proven the referential transparency of their exported API, and that any necessary preconditions for the user are documented. Rather than a blanket ban, your department should establish a policy and a review process for making similar assurances internally.

Well, there are valid uses for unsafePerformIO. It's not there just to be decorative, or as a temptation to test your virtue. None of those uses, however, involve adding meaningful side effects to everyday code. Here's a few examples of uses that can potentially be justified, with various degrees of suspicion:
Wrapping a function that's impure internally, but has no externally observable side effects. This is the same basic idea as the ST monad, except that here the burden is on the programmer to show that the impurity doesn't "leak".
Disguising a function that's deliberately impure in some restricted way. For instance, write-only impurity looks the same as total purity "from the inside", since there's no way to observe the output that's produced. This can be useful for some kinds of logging or debugging, where you explicitly don't want the consistency and well-defined ordering required by the IO monad. An example of this is Debug.Trace.trace, which I sometimes refer to as unsafePerformPrintfDebugging.
Introspection on pure computations, producing a pure result. A classic example is something like the unambiguous choice operator, which can run two equivalent pure functions in parallel in order to get an answer quicker.
Internally unobservable breaking of referential transparency, such as introducing nondeterminism when initializing data. As long as each impure function is evaluated only once, referential transparency will be effectively preserved during any single run of the program, even if the same faux-pure function called with the same arguments gives different results on different runs.
The important thing to note about all of the above is that the resulting impurity is carefully controlled and limited in scope. Given a more fine-grained system of controlling side-effects than the all-purpose IO monad, these would all be obvious candidates for slicing off bits of semi-purity, much like the controlled mutable state in the aforementioned ST monad.
Post scriptum: If a hard-line stance against any non-required use of unsafePerformIO is being considered, I strongly encourage
extending the prohibition to include unsafeInterleaveIO and any functions that allow observation of its behavior. It's at least as sketchy as some of the unsafePerformIO examples I listed above, if you ask me.

unsafePerformIO is the runST of the IO monad. It is sometimes essential. However, unlike runST, the compiler cannot check that you are preserving referential transparency.
So if you use it, the programmer has a burden to explain why the use is safe. It shouldn't be banned, it should be accompanied with evidence.

Outlawing unsafePerformIO in "application" code is an excellent idea. In my opinion there is no excuse for unsafePerformIO in normal code and in my experience it is not needed. It is really not part of the language so you are not really programming in Haskell any more if you use it. How do you know what it even means?
On the other hand, using unsafePerformIO in an FFI binding is reasonable if you know what you are doing.

Outlawing unsafePerformIO is a terrible idea, because it effectively locks code into the IO monad: for example, a c library binding will almost always be in the IO monad - however, using unsafePerformIO a higher-level purely functional library can be built on top of it.
Arguably, unsafePerformIO reflects the compromise between the highly stateful model of the personal computer and the pure, stateless model of haskell; even a function call is a stateful from the computer's point of view since it requires pushing arguments onto a stack, messing with registers, etc., but the usage is based on the knowledge that these operations do in fact compose functionally.

Will I develop good/bad habits because of lazy evaluation?

I'm looking to learn functional programming with either Haskell or F#.
Are there any programming habits (good or bad) that could form as a result Haskell's lazy evaluation? I like the idea of Haskell's functional programming purity for the purposes of understanding functional programming. I'm just a bit worried about two things:
I may misinterpret lazy-evaluation-based features as being part of the "functional paradigm".
I may develop thought patterns that work in a lazy world but not in a normal order/eager evaluation world.

There are habits that you get into when programming in a lazy language that don't work in a strict language. Some of these seem so natural to Haskell programmers that they don't think of them as lazy evaluation. A couple of examples off the top of my head:
f x y = if x > y then .. a .. b .. else c
where
a = expensive
b = expensive
c = expensive
here we define a bunch of subexpressions in a where clause, with complete disregard for which of them will ever be evaluated. It doesn't matter: the compiler will ensure that no unnecessary work is performed at runtime. Non-strict semantics means that the compiler is able to do this. Whenever I write in a strict language I trip over this a lot.
Another example that springs to mind is "numbering things":
pairs = zip xs [1..]
here we just want to associate each element in a list with its index, and zipping with the infinite list [1..] is the natural way to do it in Haskell. How do you write this without an infinite list? Well, the fold isn't too readable
pairs = foldr (\x xs -> \n -> (x,n) : xs (n+1)) (const []) xs 1
or you could write it with explicit recursion (too verbose, doesn't fuse). There are several other ways to write it, none of which are as simple and clear as the zip.
I'm sure there are many more. Laziness is surprisingly useful, when you get used to it.

You'll certainly learn about evaluation strategies. Non-strict evaluation strategies can be very powerful for particular kinds of programming problems, and once you're exposed to them, you may be frustrated that you can't use them in some language setting.
I may develop thought patterns that work in a lazy world but not in a normal order/eager evaluation world.
Right. You'll be a more rounded programmer. Abstractions that provide "delaying" mechanisms are fairly common now, so you'd be a worse programmer not to know them.

I may misinterpret lazy-evaluation-based features as being part of the "functional paradigm".
Lazy evaluation is an important part of the functional paradigm. It's not a requirement - you can program functionally with eager evaluation - but it's a tool that naturally fits functional programming.
You see people explicitly implement/invoke it (notably in the form of lazy sequences) in languages that don't make it the default; and while mixing it with imperative code requires caution, pure functional code allows safe use of laziness. And since laziness makes many constructs cleaner and more natural, it's a great fit!
(Disclaimer: no Haskell or F# experience)

To expand on Beni's answer: if we ignore operational aspects in terms of efficiency (and stick with a purely functional world for the moment), every terminating expression under eager evaluation is also terminating under non-strict evaluation, and the values of both (their denotations) coincide.
This is to say that lazy evaluation is strictly more expressive than eager evaluation. By allowing you to write more correct and useful expressions, it expands your "vocabulary" and ability to think functionally.
Here's one example of why:
A language can be lazy-by-default but with optional eagerness, or eager by default with optional laziness, but in fact its been shown (c.f. Okasaki for example) that there are certain purely functional data structures which can only achieve certain orders of performance if implemented in a language that provides laziness either optionally or by default.
Now when you do want to worry about efficiency, then the difference does matter, and sometimes you will want to be strict and sometimes you won't.
But worrying about strictness is a good thing, because very often the cleanest thing to do (and not only in a lazy-by-default language) is to use a thoughtful mix of lazy and eager evaluation, and thinking along these lines will be a good thing no matter which language you wind up using in the future.
Edit: Inspired by Simon's post, one additional point: many problems are most naturally thought about as traversals of infinite structures rather than basically recursive or iterative. (Although such traversals themselves will generally involve some sort of recursive call.) Even for finite structures, very often you only want to explore a small portion of a potentially large tree. Generally speaking, non-strict evaluation allows you to stop mixing up the operational issue of what the processor actually bothers to figure out with the semantic issue of the most natural way to represent the actual structure you're using.

Recently, i found myself doing Haskell-style programming in Python. I took over a monolithic function that extracted/computed/generated values and put them in a file sink, in one step.
I thought this was bad for understanding, reuse and testing. My plan was to separate value generation and value processing. In Haskell i would have generated a (lazy) list of those computed values in a pure function and would have done the post-processing in another (side-effect bearing) function.
Knowing that non-lazy lists in Python can be expensive, if they tend to get big, i thought about the next close Python solution. To me that was to use a generator for the value generation step.
The Python code got much better thanks to my lazy (pun intended) mindset.

I'd expect bad habits.
I saw one of my coworkers try to use (hand-coded) lazy evaluation in our .NET project. Unfortunately the consequence of lazy evaluation hid the bug where it would try remote invocations before the start of main executed, and thus outside the try/catch to handle the "Hey I can't connect to the internet" case.
Basically, the manner of something was hiding the fact that something really expensive was hiding behind a property read and so made it look like a good idea to do inside the type initializer.

Contextual information missing.
Laziness (or more specifically, the assumption of the availabilty of the purity and equational reasoning) is sometimes quite useful for specific problem domains, but not necessarily better in general. If you're talking about general-purpose language settings, relying on the lazy evaluation rules by default is considered harmful.
Analysis
Any languages has functional combination (or the applicable terms combination; i.e. function call expression, function-like macro invocation, FEXPRs, etc.) enforces rules on evaluation, implying the order of different parts of subcomputation therein. For convenience and the simplicity of the specification of the language, a language usually specify the rules in a flavor paired to the reduction strategy:
The strict evaluation, or the applicative-order reduction, which evaluates all subexpression first, before the subcomputation of the remaining evaluation of the hole combination.
The non-strict evaluation, or the normal-order reduction, which does not necessarily evaluate every subexpression at first.
The remaining subcomputation finally determines the result of the whole evaluation of the expression. (For program-defined constructs, this usually implies the substitution of the evaluated argument into something like a function body, and the subsequent evaluation of the result.)
Lazy evaluation, or the call-by-need strategy, is a typical concrete instance of the non-strict evaluation kind. To make it practically usable, subexpression evaluations are required to be pure (side-effect-free), so the reductions implementing the strategy can have the Church-Rosser property whatever the order of subexpression evaluation is actually adopted.
One significant merit of such design is the availability of the equational resoning: users can encode the equality of expression evaluation in the program, and optimizing implementation of the language can perform the transformation depending directly on such constructs.
However, there are many serious problems behind such design.
Equational reasoning is not important as it in the first glance in practice.
The encoding is not a separate feature. It has some specific requirements on the other features to carry the encoding. For a pure language, it is even more difficult to encode them elsewhere, so there is certain pressure to make the type system more expressive, hence more complicated typing and typechecking.
Whether the compiler uses the equational reasoning directly encoded in the program or not is an implementation detail. It is more of a taste of style to promote the importance.
Syntatic equations are not powerful enough to encode semantic conditions like cases of "unspecified behavior" in ISO C. It still needs some additional primitives to express non-determinism of such semantic equivalence classes to make optimization techniques based on such equivalence possible.
It is computationally inefficient at the very basic level by default, and not amendable by the programmer easily.
There is no systemic way to reduce the cost on equations which are known not required by the programmer.
One of the significance comes from the clash between lazily evaluated combinations and proper tail recursion over the combinations.
The unpredictable abuse of thunks to memoize the lazily evaluated expressions also makes troubles on the utilization of the machine resources (e.g. registers and the cache memory).
Purely functional languages like Haskell may declare the referential transparency is a good thingTM. However, this is faulty in certain contexts.
There are semantic gaps over the terminology itself. The purity is not the only aspect for the referential transparency; moreover, there are other kinds of such property not readily provided by the evaluation strategy.
In general, referential transparency should not be a goal about programming. Instead, it is an optional manner to implement the composable components of programs. Composability is essentially about the expected invariance on the interface of the components. There are many ways to keep the composability without the aid of any kinds of referential transparency. Whether the guarantee should be enforced by the language rules? It depends. At least, it should not depend totally on the language designers' point.
The lack of impure evaluations requires more syntax noises to encode many constructs simply expressible by mutable state cells in the traditional impure languages. The workarounds of the practical problems do make the solution more difficult and hard to reason by humans.
For example, I/O operations are side-effectful, thus not directly expressible in Haskell expressions under the usual non-strict evaluation rules, otherwise the order of effects will be non-deterministic.
To overcoming the shortcoming, some indirect conventional constructs like the IO monad to simulate the traditional imperative style are proposed. Such monadic constructs are in essential "indirect" in the sense similar to the continuation-passing style, which is considerably low-level and difficult to read. Even though monads can be "powerful" than continuations in expresiveness, it does not naturally powerful than more high-level alternatives (like algebraic effect systems) when the lazy evaluation strategy is not enforced by default.
Besides the intuition problem above, the necessity of using monadic constructs are often difficult to prove formally (if ever possible). As the result, they are very easily abused (just like the design patterns for "OOP" languages derived from Simula). The related syntax sugar, notably, the famous do-notation, is abused for a few decades before well-known by the Haskell community.
Simulating strict language constructs in languages like Haskell usually needs monadic constructs, while simulating non-strict constructs in strict languages are considerably simpler and easier to implement efficiently. For instance, there is SRFI-45.
The lazy evaluation strategy does not deal with many other non-strict constructs well.
For example, seq has to be a compiler magic in GHC. This is not easily expressible by other Haskell constructs without massive changes in the core Haskell language rules.
Although traditional strict languages also do not allow user programs to simulate the enforcement of the order easily so such sequential constructs are therefore primitive (examples: C-like ; is primitive; the derivation of Scheme's begin is relying on the primitive lambda which in turn implying an implicit evaluation order on expressions), it can be implementable reusing the applicative order rules without additional ad-hoc primitives, like the derivation of the$sequence operator in the Kernel language.
Concerns about specific questions
Lazy evaluation is not a must for the "functional paradigm", though as mentioned above, purely functional languages are likely have the lazy evaluation strategy by default. The common properties are the usability of first-class functions. Impure languages like Lisp and ML family are considered "functional", which use eager evaluation by default. Also note the popularity of "functional paradigm" came after the introducing of function-level programming. The latter is quite different, but still somewhat similar to "functional programming" on the treatment of first-classness.
As mentioned above, the way to simulate laziness in eager languages are well-known. Additionally, for pure programs, there may be no non-trivially semantic difference between call-by-need and normal order reduction. To figure out something really only work in a lazy world is actually not easy. (Do you want to implement the language?) Just go ahead.
Conclusion
Be careful to the problem domain. Lazy evaluation may work well for specific scenarios. However, making it by default is likely to be a bad idea in general, because users (whoever to use the language to program, or to derive a new dialect based on the current language) will likely have few chances to ignore all of the problems it will cause.

Well, try to think of something that would work if lazily evaluated, that wouldn't if eagerly evaluated. The most common category of these would be lazy logical operator evaluation used to hide a "side effect". I'll use C#-ish language to explain, but functional languages would have similar analogs.
Take the simple C# lambda:
(a,b) => a==0 || ++b < 20
In a lazy-evaluated language, if a==0, the expression ++b < 20 is not evaluated (because the entire expression evaluates to true either way), which means that b is not incremented. In both imperative and functional languages, this behavior (and similar behavior of the AND operator) can be used to "hide" logic containing side effects that should not be executed:
(a,b) => a==0 && save(b)
"a" in this case may be the number of validation errors. If there were validation errors, the first half fails and the second half is not evaluated. If there were no validation errors, the second half is evaluated (which would include the side effect of trying to save b) and the result (apparently true or false) is returned to be evaluated. If either side evaluates to false, the lambda returns false indicating that b was not successfully saved. If this were evaluated "eagerly", we would try to save regardless of the value of "a", which would probably be bad if a nonzero "a" indicated that we shouldn't.
Side effects in functional languages are generally considered a no-no. However, there are few non-trivial programs that do not require at least one side effect; there's generally no other way to make a functional algorithm integrate with non-functional code, or with peripherals like a data store, display, network channel, etc.

Is Haskell really a purely functional language considering unsafePerformIO?

Haskell is generally referenced as an example of a purely functional language. How can this be justified given the existence of System.IO.Unsafe.unsafePerformIO ?
Edit: I thought with "purely functional" it was meant that it is impossible to introduce impure code into the functional part of the program.

The Languages We Call Haskell
unsafePerformIO is part of the Foreign Function Interface specification, not core Haskell 98 specification. It can be used to do local side effects that don't escape some scope, in order to expose a purely functional interface. That is, we use it to hide effects when the type checker can't do it for us (unlike the ST monad, which hides effects with a static guarantee).
To illustrate precisely the multiple languages that we call "Haskell", consider the image below. Each ring corresponds to a specific set of computational features, ordered by safety, and with area correlating to expressive power (i.e. the number of programs you can write if you have that feature).
The language known as Haskell 98 is specified right down in the middle, admitting total and partial functions. Agda (or Epigram), where only total functions are allowed, is even less expressive, but "more pure" and more safe. While Haskell as we use it today includes everything out to the FFI, where unsafePerformIO lives. That is, you can write anything in modern Haskell, though if you use things from the outer rings, it will be harder to establish safety and security guarantees made simple by the inner rings.
So, Haskell programs are not typically built from 100% referentially transparent code, however, it is the only moderately common language that is pure by default.

I thought with "purely functional" it was meant that it is impossible to introduce impure code...
The real answer is that unsafePerformIO is not part of Haskell, any more than say, the garbage collector or run-time system are part of Haskell. unsafePerformIO is there in the system so that the people who build the system can create a pure functional abstraction over very effectful hardware. All real languages have loopholes that make it possible for system builders to get things done in ways that are more effective than dropping down to C code or assembly code.
As to the broader picture of how side effects and I/O fit into Haskell via the IO monad, I think the easiest way to think of Haskell is that it is a pure language that describes effectful computations. When the computation described is main, the run-time system carries out those effects faithfully.
unsafePerformIO is a way to get effects in an unsafe manner; where "unsafe" means "safety must be guaranteed by the programmer"—nothing is checked by the compiler. If you are a savvy programmer and are willing to meet heavy proof obligations, you can use unsafePerformIO. But at that point you are not programming in Haskell any more; you are programming in an unsafe language that looks a lot like Haskell.

The language/implementation is purely functional. It includes a couple "escape hatches," which you don't have to use if you don't want to.

I don't think unsafePerformIO means that haskell somehow becomes impure. You can create pure (referentially transparent) functions from impure functions.
Consider the skiplist. In order for it to work well it requires access to a RNG, an impure function, but this doesn't make the data structure impure. If you add an item and then convert it to a list, the same list will be returned every time given the item you add.
For this reason I think unsafePerformIO should be thought of as promisePureIO. A function that means that functions that have side-effects and therefore would be labelled impure by the type system can become acknowledged as referentially transparent by the type system.
I understand that you have to have a slightly weaker definition of pure for this to hold though. i.e pure functions are referentially transparent and never called because of a side-effect (like print).

Unfortunately the language has to do some real world work, and this implies talking with the external environment.
The good thing is that you can (and should) limit the usage of this "out of style" code to few specific well documented portions of your program.

I have a feeling I'll be very unpopular for saying what I'm about to say, but felt I had to respond to some of the (in my opinion mis-) information presented here.
Although it's true that unsafePerformIO was officially added to the language as part of the FFI addendum, the reasons for this are largely historical rather than logical. It existed unofficially and was widely used long before Haskell ever had an FFI. It was never officially part of the main Haskell standard because, as you have observed, it was just too much of an embarrassment. I guess the hope was that it would just go away at some point in the future, somehow. Well that hasn't happened, nor will it in my opinion.
The development of FFI addendum provided a convenient pretext for unsafePerformIO to get snuck in to the official language standard as it probably doesn't seem too bad here, when compared to adding the capability to call foreign (I.E. C) code (where all bets are off regarding statically ensuring purity and type safety anyway). It was also jolly convenient to put it here for what are essentially political reasons. It fostered the myth that Haskell would be pure, if only it wasn't for all that dirty "badly designed" C, or "badly designed" operating systems, or "badly designed" hardware or .. whatever.. It's certainly true that unsafePerformIO is regularly used with FFI related code, but the reasons for this are often more to do with bad design of the FFI and indeed of Haskell itself, not bad design of whatever foreign thing Haskell is trying interface too.
So as Norman Ramsey says, the official position came to be that it was OK to use unsafePerformIO provided certain proof obligations were satisfied by whoever used it (primarily that doing this doesn't invalidate important compiler transformations like inlining and common sub-expression elimination). So far so good, or so one might think. The real kicker is that these proof obligations cannot be satisfied by what is probably the single most common use case for unsafePerformIO, which by my estimate accounts for well over 50% of all the unsafePerformIOs out there in the wild. I'm talking about the appalling idiom known as the "unsafePerformIO hack" which is provably (in fact obviously) completely unsafe (in the presence of inlining and cse) .
I don't really have the time, space or inclination to go into what the "unsafePerformIO hack" is or why it's needed in real IO libraries, but the bottom line is that folk who work on Haskells IO infrastructure are usually "stuck between a rock and a hard place". They can either provide an inherently safe API which has no safe implementation (in Haskell), or they can provide an inherently unsafe API which can be safely implemented, but what they can rarely do is provide safety in both API design and implementation. Judging by the depressing regularity with which the "unsafePerformIO hack" appears in real world code (including the Haskell standard libraries), it seems most choose the former option as the lesser of the two evils, and just hope that the compiler won't muck things up with inlining, cse or any other transformation.
I do wish all this was not so. Unfortunately, it is.

Safe Haskell, a recent extension of GHC, gives a new answer to this question. unsafePerformIO is a part of GHC Haskell, but not a part of the safe dialect.
unsafePerformIO should be used only to build referentially transparent functions; for example, memoization. In these cases, the author of a package marks it as "trustworthy". A safe module can import only safe and trustworthy modules; it cannot import unsafe modules.
For more information: GHC manual, Safe Haskell paper

Haskell is generally referenced as an example of a purely functional language. How can this be justified given the existence of System.IO.Unsafe.unsafePerformIO ?
Edit: I thought with "purely functional" it was meant that it is impossible to introduce impure code into the functional part of the program.
The IO monad is actually defined in Haskell, and you can in fact see its definition here. This monad does not exist to deal with impurities but rather to handle side effects. In any case, you could actually pattern match your way out of the IO monad, so the existence of unsafePerformIO shouldn't really be troubling to you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string