Creative uses of monads

Creative uses of monads - haskell

I'm looking for creative uses of monads to learn from. I've read somewhere that monads have been used for example in AI, but being a monad newbie, I fail to see how.
Please include a link to the source code and sample usages. No standard monads please.

Phil Wadler has written many papers on monads, but the one to read first is a lot of fun and will be accessible to any programmer; it's called The essence of functional programming. The paper includes source code and sample usages.
A personal favorite of mine is the probability monad; if you can find Sungwoo Park's PhD thesis, it has a number of interesting example codes from robotics.

There's also LogicT (backtracking monad transformer with fair operations and pruning).
It has good value to AI Search algorithms because of its constructs for fair disjunctions, for example, easily enabling computations that succeed an infinite number of times to be combined (interleaved).
It's usage is described in the ICFP'05 paper Backtracking, Interleaving, and Terminating Monad Transformers

you can find interesting and advanced monads in the blog A Neighborhood of Infinity. I can note the Vector Space Monad, and its use for rational tangles description. Unfortunately,I don't think I understand this well enough to explain it here.

One of my favorite monads is Martin Escardo's search monad. It can be found on hackage in infinite-search package.
It is the monad of "search functions" for a set of elements of type a, namely (a -> Bool) -> Maybe a (finding an element in the set matching a given predicate).

One interesting use of monad is in parsing. Parsec is the standard example.

Read series of articles on monads used to model probability and probabilistic processes here : http://www.randomhacks.net/articles/2007/03/03/smart-classification-with-haskell (follow links to prev/next parts)

Harpy, a package for run-time generation of x86 machine code, uses a code generation monad. From the description:
This is a combined reader-state-exception monad which handles all the details of handling code buffers, emitting binary data, relocation etc.
All the code generation functions in module Harpy.X86CodeGen live in this monad and use its error reporting facilities as well as the internal state maintained by the monad.
The library user can pass a user environment and user state through the monad. This state is independent from the internal state and may be used by higher-level code generation libraries to maintain their own state across code generation operations.
I found this a particularly interesting example because I think that this pattern is not uncommon: I'd invented something quite similar myself for generating a set of internal messages for my application based on messages received from a (stock) market data feed. It turns out to be an extremely comfortable way to have a framework keep track of various "global" things whilst composing simple operations that in and of themselves keep no state.
I took one step further his idea of having a user state (which I call a "substate") that could also be passed through the monad: I have a mechanism for switching out and restoring state during the monad run:
-- | Given a generator that uses different substate type, convert it
-- to a generator that runs with our substate type. As well as the
-- other-substate-type generator, the caller must provide an initial
-- substate for that generator and a function taking the final substate
-- of the generator and producing a new substate of our type. This
-- preserves all other (non-substate) parts of the master state touched
-- by the generator.
--
mgConvertSubstate :: MsgGen msg st' a -> st' -> (st' -> st) -> MsgGen msg st a
This is used for subgroups of combinators that had their own state needed for a short period. These run with just their state, not knowing anything about the state of the generator that invoked it (which helps make things more modular), and yet this preserves any non-user-specific state, such as the current list of messages generated and the current set of warnings or errors, as well as the control flow (i.e., allowing total aborts to flow upwards).

I'd like to list a couple of monads not yet mentioned in other answers.
Enumerate and weighted search monads
The Omega monad can be used to productively traverse infinite lists of results. Compare:
>>> take 10 $ liftM2 (,) [0..] [0..]
[(0,0),(0,1),(0,2),(0,3),(0,4),(0,5),(0,6),(0,7),(0,8),(0,9)]
>>> take 10 $ runOmega $ liftM2 (,) (each' [0..]) (each' [0..])
[(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0)]
With a bit more advanced WeightedSearch monad it is also possible to assign weights to computations so that results of computations with lower weights would appear first in the output.
Accumulating errors monad
A useful These data type forms a Monad similar to Either, but able to accumulate errors rather. The package also defines MonadChronicle class as well as ChronicleT monad transformer based on These.

Related

ApplicativeDo in Haskell

AFAIK one of the new additions to GHC8 is the ApplicativeDo language extension, which desugars the do-notation to the corresponding Applicative methods (<$>, <*>) if possible. I have the following questions.
How does it decide whether the desugaring to Applicative methods is possible? From what I know, it does a dependency check (if the later depends on the result of former) to decide the eligibility. Are there any other criteria?
Although this addition makes applicative code easier to read for classes that don't have any Monad instance (maybe ?). But for structures that both have a Monad and an Applicative instance: Is this a recommended practice (from the readability perspective)? Are there any other benefits?

How does it decide whether the desugaring to Applicative methods is possible? From what I know, it does a dependency check (if the later depends on the result of former) to decide the eligibility. Are there any other criteria?
The paper and trac page are the best sources of information. Some points:
ApplicativeDo uses applicatives wherever possible - including mixing them with >>= in some cases where only part of the do block is applicative
A sort of directed graph of dependencies is constructed to see which parts can be "parallelized"
Sometimes there is no obvious best translation of this graph! In this case, GHC chooses the translation that is (roughly speaking) locally smallest
Although this addition makes applicative code easier to read for classes that don't have any Monad instance (maybe ?). But for structures that both have a Monad and an Applicative instance: Is this a recommended practice (from the readability perspective)? Are there any other benefits?
Here is an answer about the difference between Applicative and Monad. To quote directly:
To deploy <*>, you choose two computations, one of a function, the other of an argument, then their values are combined by application. To deploy >>=, you choose one computation, and you explain how you will make use of its resulting values to choose the next computation. It is the difference between "batch mode" and "interactive" operation.
A prime example of this sort of thing is the Haxl monad (designed by Facebook) which is all about fetching data from some external source. Using Applicative, these requests can happen in parallel while Monad forces the requests to be sequential. In fact, this example is what motivated Simon Marlow at Facebook to make the ApplicativeDo extension in the first place and write the quoted paper about it.
In general, most Monad instances don't necessarily benefit from Applicative. From the same answer I quoted above:
I appreciate that ApplicativeDo is a great way to make more applicative (and in some cases that means faster) programs that were written in monadic style that you haven't the time to refactor. But otherwise, I'd argue applicative-when-you-can-but-monadic-when-you-must is also the better way to see what's going on.
So: use Applicative over Monad when possible, and exploit ApplicativeDo when it really is nicer to write (like it must have been in some cases at Facebook) than the corresponding applicative expression.

Monads in Haskell and Purity

My question is whether monads in Haskell actually maintain Haskell's purity, and if so how. Frequently I have read about how side effects are impure but that side effects are needed for useful programs (e.g. I/O). In the next sentence it is stated that Haskell's solution to this is monads. Then monads are explained to some degree or another, but not really how they solve the side-effect problem.
I have seen this and this, and my interpretation of the answers is actually one that came to me in my own readings -- the "actions" of the IO monad are not the I/O themselves but objects that, when executed, perform I/O. But it occurs to me that one could make the same argument for any code or perhaps any compiled executable. Couldn't you say that a C++ program only produces side effects when the compiled code is executed? That all of C++ is inside the IO monad and so C++ is pure? I doubt this is true, but I honestly don't know in what way it is not. In fact, didn't Moggi (sp?) initially use monads to model the denotational semantics of imperative programs?
Some background: I am a fan of Haskell and functional programming and I hope to learn more about both as my studies continue. I understand the benefits of referential transparency, for example. The motivation for this question is that I am a grad student and I will be giving 2 1-hour presentations to a programming languages class, one covering Haskell in particular and the other covering functional programming in general. I suspect that the majority of the class is not familiar with functional programming, maybe having seen a bit of scheme. I hope to be able to (reasonably) clearly explain how monads solve the purity problem without going into category theory and the theoretical underpinnings of monads, which I wouldn't have time to cover and anyway I don't fully understand myself -- certainly not well enough to present.
I wonder if "purity" in this context is not really well-defined?

It's hard to argue conclusively in either direction because "pure" is not particularly well-defined. Certainly, something makes Haskell fundamentally different from other languages, and it's deeply related to managing side-effects and the IO type¹, but it's not clear exactly what that something is. Given a concrete definition to refer to we could just check if it applies, but this isn't easy: such definitions will tend to either not match everyone's expectations or be too broad to be useful.
So what makes Haskell special, then? In my view, it's the separation between evaluation and execution.
The base language—closely related to the λ-caluclus—is all about the former. You work with expressions that evaluate to other expressions, 1 + 1 to 2. No side-effects here, not because they were suppressed or removed but simply because they don't make sense in the first place. They're not part of the model² any more than, say, backtracking search is part of the model of Java (as opposed to Prolog).
If we just stuck to this base language with no added facilities for IO, I think it would be fairly uncontroversial to call it "pure". It would still be useful as, perhaps, a replacement for Mathematica. You would write your program as an expression and then get the result of evaluating the expression at the REPL. Nothing more than a fancy calculator, and nobody accuses the expression language you use in a calculator of being impure³!
But, of course, this is too limiting. We want to use our language to read files and serve web pages and draw pictures and control robots and interact with the user. So the question, then, is how to preserve everything we like about evaluating expressions while extending our language to do everything we want.
The answer we've come up with? IO. A special type of expression that our calculator-like language can evaluate which corresponds to doing some effectful actions. Crucially, evaluation still works just as before, even for things in IO. The effects get executed in the order specified by the resulting IO value, not based on how it was evaluated. IO is what we use to introduce and manage effects into our otherwise-pure expression language.
I think that's enough to make describing Haskell as "pure" meaningful.
footnotes
¹ Note how I said IO and not monads in general: the concept of a monad is immensely useful for dozens of things unrelated to input and output, and the IO types has to be more than just a monad to be useful. I feel the two are linked too closely in common discourse.
² This is why unsafePerformIO is so, well, unsafe: it breaks the core abstraction of the language. This is the same as, say, putzing with specific registers in C: it can both cause weird behavior and stop your code from being portable because it goes below C's level of abstraction.
³ Well, mostly, as long as we ignore things like generating random numbers.

A function with type, for example, a -> IO b always returns an identical IO action when given the same input; it is pure in that it cannot possibly inspect the environment, and obeys all the usual rules for pure functions. This means that, among other things, the compiler can apply all of its usual optimization rules to functions with an IO in their type, because it knows they are still pure functions.
Now, the IO action returned may, when run, look at the environment, read files, modify global state, whatever, all bets are off once you run an action. But you don't necessarily have to run an action; you can put five of them into a list and then run them in reverse of the order in which you created them, or never run some of them at all, if you want; you couldn't do this if IO actions implicitly ran themselves when you created them.
Consider this silly program:
main :: IO ()
main = do
inputs <- take 5 . lines <$> getContents
let [line1,line2,line3,line4,line5] = map print inputs
line3
line1
line2
line5
If you run this, and then enter 5 lines, you will see them printed back to you but in a different order, and with one omitted, even though our haskell program runs map print over them in the order they were received. You couldn't do this with C's printf, because it immediately performs its IO when called; haskell's version just returns an IO action, which you can still manipulate as a first-class value and do whatever you want with.

I see two main differences here:
1) In haskell, you can do things that are not in the IO monad. Why is this good? Because if you have a function definitelyDoesntLaunchNukes :: Int -> IO Int you don't know that the resulting IO action doesn't launch nukes, it might for all you know. cantLaunchNukes :: Int -> Int will definitely not launch any nukes (barring any ugly hacks that you should avoid in nearly all circumstances).
2) In haskell, it's not just a cute analogy: IO actions are first class values. You can put them in lists, and leave them there for as long as you want, they won't do anything unless they somehow become part of the main action. The closest that C has to that are function pointers, which are quite a bit more cumbersome to use. In C++ (and most modern imperative languages really) you have closures which technically could be used for this purpose, but rarely are - mainly because Haskell is pure and they aren't.
Why does that distinction matter here? Well, where are you going to get your other IO actions/closures from? Probably, functions/methods of some description. Which, in an impure language, can themselves have side effects, rendering the attempt of isolating them in these languages pointless.

fiction-mode: Active
It was quite a challenge, and I think a wormhole could be forming in the neighbour's backyard, but I managed to grab part of a Haskell I/O implementation from an alternate reality:
class Kleisli k where
infixr 1 >=>
simple :: (a -> b) -> (a -> k b)
(>=>) :: (a -> k b) -> (b -> k c) -> a -> k c
instance Kleisli IO where
simple = primSimpleIO
(>=>) = primPipeIO
primitive primSimpleIO :: (a -> b) -> (a -> IO b)
primitive primPipeIO :: (a -> IO b) -> (b -> IO c) -> a -> IO c
Back in our slightly-mutilated reality (sorry!), I have used this other form of Haskell I/O to define our form of Haskell I/O:
instance Monad IO where
return x = simple (const x) ()
m >>= k = (const m >=> k) ()
and it works!
fiction-mode: Offline
My question is whether monads in Haskell actually maintain Haskell's purity, and if so how.
The monadic interface, by itself, doesn't maintain restrain the effects - it is only an interface, albeit a jolly-versatile one. As my little work of fiction shows, there are other possible interfaces for the job - it's just a matter of how convenient they are to use in practice.
For an implementation of Haskell I/O, what keeps the effects under control is that all the pertinent entities, be they:
IO, simple, (>=>) etc
or:
IO, return, (>>=) etc
are abstract - how the implementation defines those is kept private.
Otherwise, you would be able to devise "novelties" like this:
what_the_heck =
do spare_world <- getWorld -- how easy was that?
launchMissiles -- let's mess everything up,
putWorld spare_world -- and bring it all back :-D
what_the_heck -- that was fun; let's do it again!
(Aren't you glad our reality isn't quite so pliable? ;-)
This observation extends to types like ST (encapsulated state) and STM (concurrency) and their stewards (runST, atomically etc). For types like lists, Maybe and Either, their orthodox definitions in Haskell means no visible effects.
So when you see an interface - monadic, applicative, etc - for certain abstract types, any effects (if they exist) are contained by keeping its implementation private; safe from being used in aberrant ways.

ST Monad == code smell?

I'm working on implementing the UCT algorithm in Haskell, which requires a fair amount of data juggling. Without getting into too much detail, it's a simulation algorithm where, at each "step," a leaf node in the search tree is selected based on some statistical properties, a new child node is constructed at that leaf, and the stats corresponding to the new leaf and all of its ancestors are updated.
Given all that juggling, I'm not really sharp enough to figure out how to make the whole search tree a nice immutable data structure à la Okasaki. Instead, I've been playing around with the ST monad a bit, creating structures composed of mutable STRefs. A contrived example (unrelated to UCT):
import Control.Monad
import Control.Monad.ST
import Data.STRef
data STRefPair s a b = STRefPair { left :: STRef s a, right :: STRef s b }
mkStRefPair :: a -> b -> ST s (STRefPair s a b)
mkStRefPair a b = do
a' <- newSTRef a
b' <- newSTRef b
return $ STRefPair a' b'
derp :: (Num a, Num b) => STRefPair s a b -> ST s ()
derp p = do
modifySTRef (left p) (\x -> x + 1)
modifySTRef (right p) (\x -> x - 1)
herp :: (Num a, Num b) => (a, b)
herp = runST $ do
p <- mkStRefPair 0 0
replicateM_ 10 $ derp p
a <- readSTRef $ left p
b <- readSTRef $ right p
return (a, b)
main = print herp -- should print (10, -10)
Obviously this particular example would be much easier to write without using ST, but hopefully it's clear where I'm going with this... if I were to apply this sort of style to my UCT use case, is that wrong-headed?
Somebody asked a similar question here a couple years back, but I think my question is a bit different... I have no problem using monads to encapsulate mutable state when appropriate, but it's that "when appropriate" clause that gets me. I'm worried that I'm reverting to an object-oriented mindset prematurely, where I have a bunch of objects with getters and setters. Not exactly idiomatic Haskell...
On the other hand, if it is a reasonable coding style for some set of problems, I guess my question becomes: are there any well-known ways to keep this kind of code readable and maintainable? I'm sort of grossed out by all the explicit reads and writes, and especially grossed out by having to translate from my STRef-based structures inside the ST monad to isomorphic but immutable structures outside.

I don't use ST much, but sometimes it is just the best solution. This can be in many scenarios:
There are already well-known, efficient ways to solve a problem. Quicksort is a perfect example of this. It is known for its speed and in-place behavior, which cannot be imitated by pure code very well.
You need rigid time and space bounds. Especially with lazy evaluation (and Haskell doesn't even specify whether there is lazy evaluation, just that it is non-strict), the behavior of your programs can be very unpredictable. Whether there is a memory leak could depend on whether a certain optimization is enabled. This is very different from imperative code, which has a fixed set of variables (usually) and defined evaluation order.
You've got a deadline. Although the pure style is almost always better practice and cleaner code, if you are used to writing imperatively and need the code soon, starting imperative and moving to functional later is a perfectly reasonable choice.
When I do use ST (and other monads), I try to follow these general guidelines:
Use Applicative style often. This makes the code easier to read and, if you do switch to an immutable version, much easier to convert. Not only that, but Applicative style is much more compact.
Don't just use ST. If you program only in ST, the result will be no better than a huge C program, possibly worse because of the explicit reads and writes. Instead, intersperse pure Haskell code where it applies. I often find myself using things like STRef s (Map k [v]). The map itself is being mutated, but much of the heavy lifting is done purely.
Don't remake libraries if you don't have to. A lot of code written for IO can be cleanly, and fairly mechanically, converted to ST. Replacing all the IORefs with STRefs and IOs with STs in Data.HashTable was much easier than writing a hand-coded hash table implementation would have been, and probably faster too.
One last note - if you are having trouble with the explicit reads and writes, there are ways around it.

Algorithms which make use of mutation and algorithms which do not are different algorithms. Sometimes there is a strightforward bounds-preserving translation from the former to the latter, sometimes a difficult one, and sometimes only one which does not preserve complexity bounds.
A skim of the paper reveals to me that I don't think it makes essential use of mutation -- and so I think a potentially really nifty lazy functional algorithm could be developed. But it would be a different but related algorithm to that described.
Below, I describe one such approach -- not necessarily the best or most clever, but pretty straightforward:
Here's the setup a I understand it -- A) a branching tree is constructed B) payoffs are then pushed back from the leafs to the root which then indicates the best choice at any given step. But this is expensive, so instead, only portions of the tree are explored to the leafs in a nondeterministic manner. Furthermore, each further exploration of the tree is determined by what's been learned in previous explorations.
So we build code to describe the "stage-wise" tree. Then, we have another data structure to define a partially explored tree along with partial reward estimates. We then have a function of randseed -> ptree -> ptree that given a random seed and a partially explored tree, embarks on one further exploration of the tree, updating the ptree structure as we go. Then, we can just iterate this function over an empty seed ptree to get a list of increasingly more sampled spaces in the ptree. We then can walk this list until some specified cutoff condition is met.
So now we've gone from one algorithm where everything is blended together to three distinct steps -- 1) building the whole state tree, lazily, 2) updating some partial exploration with some sampling of a structure and 3) deciding when we've gathered enough samples.

It's can be really difficult to tell when using ST is appropriate. I would suggest you do it with ST and without ST (not necessarily in that order). Keep the non-ST version simple; using ST should be seen as an optimization, and you don't want to do that until you know you need it.

I have to admit that I cannot read the Haskell code. But if you use ST for mutating the tree, then you can probably replace this with an immutable tree without losing much because:
Same complexity for mutable and immutable tree
You have to mutate every node above the new leaf. An immutable tree has to replace all nodes above the modified node. So in both cases the touched nodes are the same, thus you don't gain anything in complexity.
For e.g. Java object creation is more expensive than mutation, so maybe you can gain a bit here in Haskell by using mutation. But this I don't know for sure. But a small gain does not buy you much because of the next point.
Updating the tree is presumably not the bottleneck
The evaluation of the new leaf will probably be much more expensive than updating the tree. At least this is the case for UCT in computer Go.

Use of the ST monad is usually (but not always) as an optimization. For any optimization, I apply the same procedure:
Write the code without it,
Profile and identify bottlenecks,
Incrementally rewrite the bottlenecks and test for improvements/regressions,
The other use case I know of is as an alternative to the state monad. The key difference being that with the state monad the type of all of the data stored is specified in a top-down way, whereas with the ST monad it is specified bottom-up. There are cases where this is useful.

Why does Haskell not have an I Monad (for input only, unlike the IO monad)?

Conceptually, it seems that a computation that performs output is very different from one that performs input only. The latter is, in one sense, much purer.
I, for one, would like to have a way to separate the input only parts of my programme from the ones that might actually write something out.
So, why is there no input only Monad?
Any reason why it wouldn't work to have an I monad (and an O Monad, which could be combined into the IO Monad)?
Edit: I mostly meant input as reading files, not interacting with the user. This is also my use case, where I can assume that input files do not change during the execution of the programme (otherwise, it's fine to get undefined behaviour).

I disagree with bdonlan's answer. It's true that neither input nor output are more "pure" but they are quite different. It's quite valid to critique IO as the single "sin bin" where all effects get crammed together, and it does make ensuring certain properties harder. For example, if you have many functions that you know only read from certain memory locations, and which could never cause those locations to be altered, it would be nice to know that you can reorder their execution. Or if you have a program that uses forkIO and MVars, it would be nice to know, based on its type, that it isn't also reading /etc/passwd.
Furthermore, one can compose monadic effects in a fashion besides just stacked transformers. You can't do this with all monads (just free monads), but for a case like this that's all you really need. The iospec package, for example, provides a pure specification of IO -- it doesn't seperate reading and writing, but it does seperate them from, e.g., STM, MVars, forkIO, soforth.
http://hackage.haskell.org/package/IOSpec
The key ideas for how you can combine the different monads cleanly are described in the Data Types a la Carte paper (great reading, very influential, can't recommend enough, etc.etc.).

The 'Input' side of the IO monad is just as much output as it is input. If you consume a line of input, the fact that you consumed that input is communicated to the outside, and also serves to be recorded as impure state (ie, you don't consume the same line again later); it's just as much an output operation as a putStrLn. Additionally, input operations must be ordered with respect to output operations; this again limits how much you can separate the two.
If you want a pure read-only monad, you should probably use the reader monad instead.
That said, you seem to be a bit confused about what combining monads can do. While you can indeed combine two monads (assuming one is a monad transformer) and get some kind of hybrid semantics, you have to be able to run the result. That is, even if you could define an IT (OT Identity) r, how do you run it? You have no root IO monad in this case, so main must be a pure function. Which would mean you'd have main = runIdentity . runOT . runIT $ .... Which is nonsense, since you're getting impure effects from a pure context.
In other words, the type of the IO monad has to be fixed. It can't be a user-selectable transformed type, because its type is nailed down into main. Sure, you could call it I (O Identity), but you don't gain anything; O (I Identity) would be a useless type, as would be I [] or O Maybe, because you'd never be able to run any of these.
Of course, if IO is left as the fundamental IO monad type, you could define routines like:
runI :: I Identity r -> IO r
This works, but again, you can't have anything underneath this I monad very easily, and you're not gaining much from this complexity. What would it even mean to have an Output monad transformed over a List base monad, anyway?

When you obtain input, you cause side-effects that changes both the state of the outside world (the input is consumed) and your program (the input is used). When you output, you cause side-effects that only change the state of the outside world (output is produced); the act of outputting itself does not change the state of your program. So you might actually say that O is more "pure" than I.
Except that output does actually change the execution state of your program (It won't repeat the same output operation over and over; it has to have some sort of state change in order to move on). It all depends on how you look at it. But it's so much easier to lump the dirtiness of input and output into the same monad. Any useful program will both input and output. You can categorize the operations you use into one or the other, but I'm not seeing a convincing reason to employ the type system for the task.
Either you're messing with the outside world or you're not.

Short answer: IO is not I/O.
Other folks have longer answers if you like.

I think the division between pure and impure code is somewhat arbitrary. It depends on where you put the barrier. Haskell's designers decided to clearly separate pure functional part of the language from the rest.
So we have IO monad which incorporates all the possible effects (as different, as disk reads/writes, networking, memory access). And language enforces a clear division by means of return type. And this induces a kind of thinking which divides everything in pure and impure.
If the information security is concerned, it would be quite naturally to separate reading and writing. But for haskell's initial goal, to be a standard lazy pure functional language, it was an overkill.

Use of Haskell state monad a code smell?

God I hate the term "code smell", but I can't think of anything more accurate.
I'm designing a high-level language & compiler to Whitespace in my spare time to learn about compiler construction, language design, and functional programming (compiler is being written in Haskell).
During the code generation phase of the compiler, I have to maintain "state"-ish data as I traverse the syntax tree. For example, when compiling flow-control statements I need to generate unique names for the labels to jump to (labels generated from a counter that's passed in, updated, & returned, and the old value of the counter must never be used again). Another example is when I come across in-line string literals in the syntax tree, they need to be permanently converted into heap variables (in Whitespace, strings are best stored on the heap). I'm currently wrapping the entire code generation module in the state monad to handle this.
I've been told that writing a compiler is a problem well suited to the functional paradigm, but I find that I'm designing this in much the same way I would design it in C (you really can write C in any language - even Haskell w/ state monads).
I want to learn how to think in Haskell (rather, in the functional paradigm) - not in C with Haskell syntax. Should I really try to eliminate/minimize use of the state monad, or is it a legitimate functional "design pattern"?

I've written multiple compilers in Haskell, and a state monad is a reasonable solution to many compiler problems. But you want to keep it abstract---don't make it obvious you're using a monad.
Here's an example from the Glasgow Haskell Compiler (which I did not write; I just work around a few edges), where we build control-flow graphs. Here are the basic ways to make graphs:
empyGraph :: Graph
mkLabel :: Label -> Graph
mkAssignment :: Assignment -> Graph -- modify a register or memory
mkTransfer :: ControlTransfer -> Graph -- any control transfer
(<*>) :: Graph -> Graph -> Graph
But as you've discovered, maintaining a supply of unique labels is tedious at best, so we provide these functions as well:
withFreshLabel :: (Label -> Graph) -> Graph
mkIfThenElse :: (Label -> Label -> Graph) -- branch condition
-> Graph -- code in the 'then' branch
-> Graph -- code in the 'else' branch
-> Graph -- resulting if-then-else construct
The whole Graph thing is an abstract type, and the translator just merrily constructs graphs in purely functional fashion, without being aware that anything monadic is going on. Then, when the graph is finally constructed, in order to turn it into an algebraic datatype we can generate code from, we give it a supply of unique labels, run the state monad, and pull out the data structure.
The state monad is hidden underneath; although it's not exposed to the client, the definition of Graph is something like this:
type Graph = RealGraph -> [Label] -> (RealGraph, [Label])
or a bit more accurately
type Graph = RealGraph -> State [Label] RealGraph
-- a Graph is a monadic function from a successor RealGraph to a new RealGraph
With the state monad hidden behind a layer of abstraction, it's not smelly at all!

I'd say that state in general is not a code smell, so long as it's kept small and well controlled.
This means that using monads such as State, ST or custom-built ones, or just having a data structure containing state data that you pass around to a few places, is not a bad thing. (Actually, monads are just assistance in doing exactly this!) However, having state that goes all over the place (yes, this means you, IO monad!) is a bad smell.
An fairly clear example of this was when my team was working on our entry for the ICFP Programming Contest 2009 (the code is available at git://git.cynic.net/haskell/icfp-contest-2009). We ended up with several different modular parts to this:
VM: the virtual machine that ran the simulation program
Controllers: several different sets of routines that read the output of the simulator and generated new control inputs
Solution: generation of the solution file based on the output of the controllers
Visualizers: several different sets of routines that read both the input and output ports and generated some sort of visualization or log of what was going on as the simulation progressed
Each of these has its own state, and they all interact in various ways through the input and output values of the VM. We had several different controllers and visualizers, each of which had its own different kind of state.
The key point here was that the the internals of any particular state were limited to their own particular modules, and each module knew nothing about even the existence of state for other modules. Any particular set of stateful code and data was generally only a few dozen lines long, with a handful of data items in the state.
All this was glued together in one small function of about a dozen lines which had no access to the internals of any of the states, and which merely called the right things in the proper order as it looped through the simulation, and passed a very limited amount of outside information to each module (along with the module's previous state, of course).
When state is used in such a limited way, and the type system is preventing you from inadvertently modifying it, it's quite easy to handle. It's one of the beauties of Haskell that it lets you do this.
One answer says, "Don't use monads." From my point of view, this is exactly backwards. Monads are a control structure that, among other things, can help you minimize the amount of code that touches state. If you look at monadic parsers as an example, the state of the parse (i.e., the text being parsed, how far one has gotten in to it, any warnings that have accumulated, etc.) must run through every combinator used in the parser. Yet there will only be a few combinators that actually manipulate the state directly; anything else uses one of these few functions. This allows you to see clearly and in one place all of a small amount of code that can change the state, and more easily reason about how it can be changed, again making it easier to deal with.

Have you looked at Attribute grammars (AG)? (More info on wikipedia and an article in the Monad Reader)?
With AG you can add attributes to a syntax tree. These attributes are separated in synthesized and inherited attributes.
Synthesized attributes are things you generate (or synthesize) from your syntax tree, this could be the generated code, or all comments, or whatever else your interested in.
Inherited attributes are input to your syntax tree, this could be the environment, or a list of labels to use during code generation.
At Utrecht University we use the Attribute Grammar System (UUAGC) to write compilers. This is a pre-processor which generates haskell code (.hs files) from the provided .ag files.
Although, if you're still learning Haskell, then maybe this is not the time to start learning yet another layer of abstraction over that.
In that case, you could manually write the sort of code that attributes grammars generate for you, for example:
data AbstractSyntax = Literal Int | Block AbstractSyntax
| Comment String AbstractSyntax
compile :: AbstractSyntax -> [Label] -> (Code, Comments)
compile (Literal x) _ = (generateCode x, [])
compile (Block ast) (l:ls) = let (code', comments) = compile ast ls
in (labelCode l code', comments)
compile (Comment s ast) ls = let (code, comments') = compile ast ls
in (code, s : comments')
generateCode :: Int -> Code
labelCode :: Label -> Code -> Code

It's possible that you may want an applicative functor instead of a
monad:
http://www.haskell.org/haskellwiki/Applicative_functor
I think the original paper explains it better than the wiki, however:
http://www.soi.city.ac.uk/~ross/papers/Applicative.html

I don't think using the State Monad is a code smell when it used to model state.
If you need to thread state through your functions,
you can do this explicitly, taking the the state as an argument and returning it in each function.
The State Monad offers a good abstraction: it passes the state along for you and
provides lots of useful function to combine functions that require state.
In this case, using the State Monad (or Applicatives) is not a code smell.
However, if you use the State Monad to emulate an imperative style of programming
while a functional solution would suffice, you are just making things complicated.

In general you should try to avoid state wherever possible, but that's not always practical. Applicative makes effectful code look nicer and more functional, especially tree traversal code can benefit from this style. For the problem of name generation there is now a rather nice package available: value-supply.

Well, don't use monads. The power of functional programming is function purity and their reuse. There's this paper a professor of mine once wrote and he's one of the guys who helped build Haskell.
The paper is called "Why functional programming matters", I suggest you read through it. It's a good read.

let's be careful about the terminology here. State is not per se bad; functional languages have state. What is a "code smell" is when you find yourself wanting to assign variables values and change them.
Of course, the Haskell state monad is there for just that reason -- as with I/O, it's letting you do unsafe and un-functional things in a constrained context.
So, yes, it's probably a code smell.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string