Data structure to represent automata

Data structure to represent automata - haskell

I'm currently trying to come up with a data structure that fits the needs of two automata learning algorithms I'd like to implement in Haskell: RPNI and EDSM.
Intuitively, something close to what zippers are to trees would be perfect: those algorithms are state merging algorithms that maintain some sort of focus (the Blue Fringe) on states and therefore would benefit of some kind of zippers to reach interesting points quickly. But I'm kinda lost because a DFA (Determinist Finite Automaton) is more a graph-like structure than a tree-like structure: transitions can make you go back in the structure, which is not likely to make zippers ok.
So my question is: how would you go about representing a DFA (or at least its transitions) so that you could manipulate it in a fast fashion?

Let me begin with the usual opaque representation of automata in Haskell:
newtype Auto a b = Auto (a -> (b, Auto a b))
This represents a function that takes some input and produces some output along with a new version of itself. For convenience it's a Category as well as an Arrow. It's also a family of applicative functors. Unfortunately this type is opaque. There is no way to analyze the internals of this automaton. However, if you replace the opaque function by a transparent expression type you should get automata that you can analyze and manipulate:
data Expr :: * -> * -> * where
-- Stateless
Id :: Expr a a
-- Combinators
Connect :: Expr a b -> Expr b c -> Expr a c
-- Stateful
Counter :: (Enum b) => b -> Expr a b
This gives you access to the structure of the computation. It is also a Category, but not an arrow. Once it becomes an arrow you have opaque functions somewhere.

Can you just use a graph to get started? I think the fgl package is part of the Haskell Platform.
Otherwise you can try defining your own structure with 'deriving (Data)' and use the "Scrap Your Zipper" library to get the Zipper.

If you don't need any fancy graph algorithms you can represent your DFA as a Map State State. This gives you fast access and manipulation. You also get focus by keeping track of the current state.

Take a look at the regex-tdfa package: http://hackage.haskell.org/package/regex-tdfa
The source is pretty complex, but it's an implementations of regexes with tagged DFAs tuned for performance, so it should illustrate some good practices for representing DFAs efficiently.

Related

How to use category theory diagrams with polyary functions?

So, there's a lot of buzz about categories all around the Haskell ecosystem. But I feel one piece is missing from the common sense I have so far absorbed by osmosis. (I did read the first few pages of Mac Lane's famous introduction as well, but I don't believe I have enough mathematical maturity to carry the wisdom from this text to actual programming I have at hand.) I will now follow with a real world example involving a binary function that I have trouble depicting in categorical terms.
So, I have this function chain that allows me to S -> A, where A is a type synonym for a function, akin to a -> b. Now, I want to depict a process that does S -> a -> b, but I end up with an arrow pointing to another arrow rather than an object. How do I deal with such predicament?
I did overhear someone talking about a thing called n-category but I don't know if I should even try to understand what it is and how it's useful.
Though I believe my abstraction is accurate, the actual functions are parsePath >>> either error id >>> toAxis :: String -> Text.XML.Cursor.Axis from selectors and Axis = Text.XML.Cursor.Cursor -> [Text.XML.Cursor.Cursor] from xml-conduit.

There are two approaches to model binary functions as morphism in category theory (n-ary functions are dealt with similarly -- no new machinery is needed). One is to consider the uncurried version:
(A * B) -> C
where we take the product of the types A and B as a starting object. For that we need the category to contain such a products. (In Haskell, products are written (A, B). Well, technically in Haskell this is not exactly the product as in categories, but let's ignore that.)
Another is to consider the result type (B -> C) as an object in the category. Usually, this is called an exponential object, written as C^B. Assuming our category has such objects, we can write
A -> C^B
These two representations of binary functions are isomorphic: using curry and uncurry we can transform each one into the other.
Indeed, when there is such a (natural) isomorphism, we get a so called cartesian closed category, which is the simplest form of category which can describe a simply typed lambda calculus -- the core of every typed functional language.
This isomorphism is often cited as an adjunction between two functors
(- * B) -| (- ^ B)

I can use tuple projections to depict this situation, as follows:
-- Or, in actual Haskell terms:
This diagram features backwards fst & snd arrows in place of a binary function that constructs the tuple from its constituents, and that I can in no way depict directly. The caveat is that, while in this diagram Cursor has only one incoming arrow, I should remember that in actual code some real arrows X -> Axis & Y -> Cursor should go to both of the projections of the tuple, not just the symbolic projecting functions. The flow will then be uniformly left to right.
Pragmatically speaking, I traded an arrow with two sources (that constructs a tuple and isn't a morphism) for two reversed arrows (the tuple's projections that are legal morphisms in all regards).

Monotonic sequence type in haskell

I would like to define a type for infinite number sequence in haskell. My idea is:
type MySeq = Natural -> Ratio Integer
However, I would also like to be able to define some properties of the sequence on the type level. A simple example would be a non-decreasing sequence like this. Is this possible to do this with current dependent-type capabilities of GHC?
EDIT: I came up with the following idea:
type PositiveSeq = Natural -> Ratio Natural
data IncreasingSeq = IncreasingSeq {
start :: Ratio Natural,
diff :: PositiveSeq}
type IKnowItsIncreasing = [Ratio Natural]
getSeq :: IncreasingSeq -> IKnowItsIncreasing
getSeq s = scanl (+) (start s) [diff s i | i <- [1..]]
Of course, it's basically a hack and not actually type safe at all.

This isn't doing anything very fancy with types, but you could change how you interpret a sequence of naturals to get essentially the same guarantee.
I think you are thinking along the right lines in your edit to the question. Consider
data IncreasingSeq = IncreasingSeq (Integer -> Ratio Natural)
where each ratio represents how much it has increased from the previous number (starting with 0).
Then you can provide a single function
applyToIncreasing :: ([Ratio Natural] -> r) -> IncreasingSeq -> r
applyToIncreasing f (IncreasingSeq s) = f . drop 1 $ scanl (+) 0 (map (s $) [0..])
This should let you deconstruct it in any way, without allowing the function to inspect the real structure.
You just need a way to construct it: probably a fromList that just sorts it and an insert that performs a standard ordered insertion.
It pains part of me to say this, but I don't think you'd gain anything over this using fancy type tricks: there are only three functions that could ever possibly go wrong, and they are fairly simple to correctly implement. The implementation is hidden so anything that uses those is correct as a result of those functions being correct. Just don't export the data constructor for IncreasingSeq.
I would also suggest considering making [Ratio Natural] be the underlying representation. It simplifies things and guarantees that there are no "gaps" in the sequence (so it is guaranteed to be a sequence).
If you want more safety and can take the performance hit, you can use data Nat = Z | S Nat instead of Natural.
I will say that if this was Coq, or a similar language, instead of Haskell I would be more likely to suggest doing some fancier type-level stuff (depending on what you are trying to accomplish) for a couple reasons:
In systems like Coq, you are usually proving theorems about the code. Because of this, it can be useful to have a type-level proof that a certain property holds. Since Haskell doesn't really have a builtin way to prove those sorts of theorems, the utility diminishes.
On the other hand, we can (sometimes) construct data types that essentially must have the properties we want using a small number of trusted functions and a hidden implementation. In the context of a system with more theorem proving capability, like Coq, this might be harder to convince theorem prover of the property than if we used a dependent type (possibly, at least). In Haskell, however, we don't have that issue in the first place.

When to expose constructors of a data type when designing data structures?

When designing data structures in functional languages there are 2 options:
Expose their constructors and pattern match on them.
Hide their constructors and use higher-level functions to examine the data structures.
In what cases, what is appropriate?
Pattern matching can make code much more readable or simpler. On the other hand, if we need to change something in the definition of a data type then all places where we pattern-match on them (or construct them) need to be updated.
I've been asking this question myself for some time. Often it happens to me that I start with a simple data structure (or even a type alias) and it seems that constructors + pattern matching will be the easiest approach and produce a clean and readable code. But later things get more complicated, I have to change the data type definition and refactor a big part of the code.

The essential factor for me is the answer to the following question:
Is the structure of my datatype relevant to the outside world?
For example, the internal structure of the list datatype is very much relevant to the outside world - it has an inductive structure that is certainly very useful to expose to consumers, because they construct functions that proceed by induction on the structure of the list. If the list is finite, then these functions are guaranteed to terminate. Also, defining functions in this way makes it easy to provide properties about them, again by induction.
By contrast, it is best for the Set datatype to be kept abstract. Internally, it is implemented as a tree in the containers package. However, it might as well have been implemented using arrays, or (more usefully in a functional setting) with a tree with a slightly different structure and respecting different invariants (balanced or unbalanced, branching factor, etc). The need to enforce any invariants above and over those that the constructors already enforce through their types, by the way, precludes letting the datatype be concrete.
The essential difference between the list example and the set example is that the Set datatype is only relevant for the operations that are possible on Set's. Whereas lists are relevant because the standard library already provides many functions to act on them, but in addition their structure is relevant.
As a sidenote, one might object that actually the inductive structure of lists, which is so fundamental to write functions whose termination and behaviour is easy to reason about, is captured abstractly by two functions that consume lists: foldr and foldl. Given these two basic list operators, most functions do not need to inspect the structure of a list at all, and so it could be argued that lists too coud be kept abstract. This argument generalizes to many other similar structures, such as all Traversable structures, all Foldable structures, etc. However, it is nigh impossible to capture all possible recursion patterns on lists, and in fact many functions aren't recursive at all. Given only foldr and foldl, one would, writing head for example would still be possible, though quite tedious:
head xs = fromJust $ foldl (\b x -> maybe (Just x) Just b) Nothing xs
We're much better off just giving away the internal structure of the list.
One final point is that sometimes the actual representation of a datatype isn't relevant to the outside world, because say it is some kind of optimised and might not be the canonical representation, or there isn't a single "canonical" representation. In these cases, you'll want to keep your datatype abstract, but offer "views" of your datatype, which do provide concrete representations that can be pattern matched on.
One example would be if wanted to define a Complex datatype of complex numbers, where both cartesian forms and polar forms can be considered canonical. In this case, you would keep Complex abstract, but export two views, ie functions polar and cartesian that return a pair of a length and an angle or a coordinate in the cartesian plane, respectively.

Well, the rule is pretty simple: If it's easy to construct wrong values by using the actual constructors, then don't allow them to be used directly, but instead provide smart constructors. This is the path followed by some data structures like Map and Set, which are easy to get wrong.
Then there are the types for which it's impossible or hard to construct inconsistent/wrong values either because the type doesn't allow that at all or because you would need to introduce bottoms. The length-indexed list type (commonly called Vec) and most monads are examples of that.
Ultimately this is your own decision. Put yourself into the user's perspective and make the tradeoff between convenience and safety. If there is no tradeoff, then always expose the constructors. Otherwise your library users will hate you for the unnecessary opacity.

If the data type serves a simple purpose (like Maybe a) and no (explicit or implicit) assumptions about the data type can be violated by directly constructing a value via the data constructors, I would expose the constructors.
On the other hand, if the data type is more complex (like a balanced tree) and/or it's internal representation is likely to change, I usually hide the constructors.
When using a package, there's an unwritten rule that the interface exposed by a non-internal module should be "safe" to use on the given data type. Considering the balanced tree example, exposing the data constructors allows one to (accidentally) construct an unbalanced tree, and so the assumed runtime guarantees for searching the tree etc might be violated.

If the type is used to represent values with a canonical definition and representation (many mathematical objects fall into this category), and it's not possible to construct "invalid" values using the type, then you should expose the constructors.
For example, if you're representing something like two dimensional points with your own type (including a newtype), you might as well expose the constructor. The reality is that a change to this datatype is not going to be a change in how 2d points are represented, it's going to be a change in your need to use 2d points (maybe you're generalising to 3d space, maybe you're adding a concept of layers, or whatever), and is almost certain to need attention in the parts of the code using values of this type no matter what you do.[1]
A complex type representing something specific to your application or field is quite likely to undergo changes to the representation while continuing to support similar operations. Therefore you only want other modules depending on the operations, not on the internal structure. So you shouldn't expose the constructors.
Other types represent things with canonical definitions but not canonical representations. Everyone knows the properties expected of maps and sets, but there are lots of different ways of representing values that support those properties. So you again only want other modules depending on the operations they support, not on the particular representations.
Some types, whether or not they are if simple with canonical representations, allow the construction of values in the program which don't represent a valid value in the abstract concept the type is supposed to represent. A simple example would be a type representing a self-balancing binary search tree; client code with access to the constructors could easily construct invalid trees. Exposing the constructors either means you need to assume that such values passed in from outside may be invalid and therefore you need to make something sensible happen even for bizarre values, or means that it's the responsibility of the programmers working with your interface to ensure they don't violate any assumptions. It's usually better to just keep such types from being constructed directly outside your module.
Basically it comes down to the concept your type is supposed to represent. If your concept maps in a very simple and obvious[2] way directly to values in some data type which isn't "more inclusive" than the concept due to the compiler being unable to check needed invariants, then the concept is pretty much "the same" as the data type, and exposing its structure is fine. If not, then you probably need to keep the structure hidden.
[1] A likely change though would be to change which numeric type you're using for the coordinate values, so you probably do have to think about how to minimise the impact of such changes. That's pretty orthogonal to whether or not you expose the constructors though.
[2] "Obvious" here meaning that if you asked 10 people independently to come up with a data type representing the concept they would all come back with the same thing, modulo changing the names.

I would propose a different, noticeably more restrictive rule than most people. The central criterion would be:
Do you guarantee that this type will never, ever change? If so, exposing the constructors might be a good idea. Good luck with that, though!
But the types for which you can make that guarantee tend to be very simple, generic "foundation" types like Maybe, Either or [], which one could arguably write once and then never revisit again.
Though even those can be questioned, because they do get revisited from time to time; there's people who have used Church-encoded versions of Maybe and List in various contexts for performance reasons, e.g.:
{-# LANGUAGE RankNTypes #-}
newtype Maybe' a = Maybe' { elimMaybe' :: forall r. r -> (a -> r) -> r }
nothing = Maybe' $ \z k -> z
just x = Maybe' $ \z k -> k x
newtype List' a = List' { elimList' :: forall r. (a -> r -> r) -> r -> r }
nil = List' $ \k z -> z
cons x xs = List' $ \k z -> k x (elimList' k z xs)
These two examples highlight something important: you can replace the Maybe' type's implementation shown above with any other implementation as long as it supports the following three functions:
nothing :: Maybe' a
just :: a -> Maybe' a
elimMaybe' :: Maybe' a -> r -> (a -> r) -> r
...and the following laws:
elimMaybe' nothing z x == z
elimMaybe' (just x) z f == f x
And this technique can be applied to any algebraic data type. Which to me says that pattern matching against concrete constructors is just insufficiently abstract; it doesn't really gain you anything that you can't get out of the abstract constructors + destructor pattern, and it loses implementation flexibility.

Choosing between a class and a record

Basic question: what design principles should one follow when choosing between using a class or using a record (with polymorphic fields) ?
First, we know that classes and records are essentially equivalent (since in Core, classes get desugared to dictionaries, which are just records). Nevertheless, there are differences: classes are passed implicitly, records must be explicit.
Looking a little deeper, classes are really useful when:
we have many different representations of 'the same thing', and
in actual usage, which representation is used can be inferred.
Classes are awkward when we have (up to parametric polymorphism) only one representation of our data, but we have multiple instances. This leads to the syntactic noise of having to use newtype to add extra tags (which exist only in our code, as we know such tags get erased at run time) if we don't want to turn on all sorts of troublesome extensions (i.e. overlapping and/or undecidable instances).
Of course, things get muddier: what if I want to have constraints on my types? Let's pick a real example:
class (Bounded i, Enum i) => Partition a i where
index :: a -> i
I could just as easily have done
data Partition a i = Partition { index :: a -> i}
But now I've lost my constraints, and I will have to add them to specific functions instead.
Are there design guidelines that would help me out?

I tend to see no issue with only requiring constraints on functions. The issue is, I suppose, that your data structure no longer models precisely what you intend it to. On the other hand, if you think of it as a data structure first and foremost, then that should matter less.
I feel like I don't necessarily still have a good grasp on the question, and this is about as vague as can be, but my rule of thumb tends to be that typeclasses are things that obey laws (or model meaning), and datatypes are things that encode a certain quantity of information.
When we want to layer behavior in complex ways, I've found that typeclasses start off enticingly, but can get painful quickly and switching to dictionary-passing makes things more straightforward. Which is to say that when we want implementations to be interoperable, then we should fall back to a uniform dictionary type.
This is take two, expanding a bit on a concrete example, but still just sort of spinning ideas...
Suppose we want to model probability distributions over the reals. Two natural representations come to mind.
A) Typeclass-driven
class PDist a where
sample :: a -> Gen -> Double
B) Dictionary-driven
data PDist = PDist (Gen -> Double)
The former lets us do
data NormalDist = NormalDist Double Double -- mean, var
instance PDist NormalDist where...
data LognormalDist = LognormalDist Double Double
instance PDist LognormalDist where...
The latter lets us do
mkNormalDist :: Double -> Double -> PDist...
mkLognormalDist :: Double -> Double -> PDist...
In the former, we can write
data SumDist a b = SumDist a b
instance (PDist a, PDist b) => PDist (SumDist a b)...
in the latter we can simply write
sumDist :: PDist -> PDist -> PDist
So what are the tradeoffs? Typeclass-driven lets us specify what distributions we're given. The tradeoff is that we have to construct an algebra of distributions explicitly, including new types for their combinations. Data-driven doesn't let us restrict the distributions we're given (or even if they're well-formed) but in return we can do whatever the heck we want.
Furthermore we can write a parseDist :: String -> PDist relatively easily, but we have to go through some angst to do the equiv for the typeclass approach.
So this is, in a sense the typed/untyped static/dynamic tradeoff at another level. We can give it a twist though, and argue that the typeclass, along with associated algebraic laws, specifies the semantics of a probability distribution. And the PDist type can indeed be made an instance of the PDist typeclass. Meanwhile, we can resign ourselves to using the PDist type (rather than typeclass) nearly everywhere, while thinking of it as iso to the tower of instances and datatypes necessary to use the typeclass more "richly."
In fact, we can even define basic PDist function in terms of typeclass functions. i.e. mkNormalPDist m v = PDist (sample $ NormalDist m v) So there's lots of room in the design space to slide between the two representations as necessary...

Note: I'm not sure that I understand the OP exactly. Suggestions/comments for improvement appreciated!
Background:
When I first learned about typeclasses in Haskell, the general rule-of-thumb I picked up was that, in comparison to Java-like languages:
typeclasses are similar to interfaces
data are similar to classes
Here's another SO question and answer that describe guidelines for using interfaces (also some drawbacks of interface over-use). My interpretation:
records/Java-classes are what something is
interfaces/typeclasses are roles that a concretion can fulfil
multiple, unrelated concretions can fulfil the same role
I bet you already know all this.
The guidelines I try to follow for my own code are:
typeclasses are for abstractions
records are for concretions
So in practice this means:
let the needs of the data determine the records
let the client code determine what the interfaces are -- clients should depend on abstractions, and thereby drive the creation and design of typeclasses
Example:
typeclass Show, with function show :: (Show s) => s -> String: for data that can be represented as a String.
clients just want to turn data into strings
clients don't care what the data (concretion) is -- only care that it can be represented as a string
role of implementing data: can be string-ified
this could not be achieved without a typeclass -- each datatype would require a conversion function with a different name, what a pain to deal with!

Type-classes can sometimes provide additional type-safety (An example would be Ord with Data.Map.union). If you have similar circumstances where choosing type-classes may help your type-safety - then use type-classes.
I'll present a different example where I think type-classes would not provide additional safety:
class Drawing a where
drawAsHtml :: a -> Html
drawOpenGL :: a -> IO ()
exampleFunctionA :: Drawing a => a -> a -> Something
exampleFunctionB :: (Drawing a, Drawing b) => a -> b -> Something
There is nothing exampleFunctionA could do and exampleFunctionB could not do (I find it hard to explain why, insights are welcome).
In this case I see no benefit of using a type-class.
(Edited following feedback from Jacques and question from missingo)

ST Monad == code smell?

I'm working on implementing the UCT algorithm in Haskell, which requires a fair amount of data juggling. Without getting into too much detail, it's a simulation algorithm where, at each "step," a leaf node in the search tree is selected based on some statistical properties, a new child node is constructed at that leaf, and the stats corresponding to the new leaf and all of its ancestors are updated.
Given all that juggling, I'm not really sharp enough to figure out how to make the whole search tree a nice immutable data structure à la Okasaki. Instead, I've been playing around with the ST monad a bit, creating structures composed of mutable STRefs. A contrived example (unrelated to UCT):
import Control.Monad
import Control.Monad.ST
import Data.STRef
data STRefPair s a b = STRefPair { left :: STRef s a, right :: STRef s b }
mkStRefPair :: a -> b -> ST s (STRefPair s a b)
mkStRefPair a b = do
a' <- newSTRef a
b' <- newSTRef b
return $ STRefPair a' b'
derp :: (Num a, Num b) => STRefPair s a b -> ST s ()
derp p = do
modifySTRef (left p) (\x -> x + 1)
modifySTRef (right p) (\x -> x - 1)
herp :: (Num a, Num b) => (a, b)
herp = runST $ do
p <- mkStRefPair 0 0
replicateM_ 10 $ derp p
a <- readSTRef $ left p
b <- readSTRef $ right p
return (a, b)
main = print herp -- should print (10, -10)
Obviously this particular example would be much easier to write without using ST, but hopefully it's clear where I'm going with this... if I were to apply this sort of style to my UCT use case, is that wrong-headed?
Somebody asked a similar question here a couple years back, but I think my question is a bit different... I have no problem using monads to encapsulate mutable state when appropriate, but it's that "when appropriate" clause that gets me. I'm worried that I'm reverting to an object-oriented mindset prematurely, where I have a bunch of objects with getters and setters. Not exactly idiomatic Haskell...
On the other hand, if it is a reasonable coding style for some set of problems, I guess my question becomes: are there any well-known ways to keep this kind of code readable and maintainable? I'm sort of grossed out by all the explicit reads and writes, and especially grossed out by having to translate from my STRef-based structures inside the ST monad to isomorphic but immutable structures outside.

I don't use ST much, but sometimes it is just the best solution. This can be in many scenarios:
There are already well-known, efficient ways to solve a problem. Quicksort is a perfect example of this. It is known for its speed and in-place behavior, which cannot be imitated by pure code very well.
You need rigid time and space bounds. Especially with lazy evaluation (and Haskell doesn't even specify whether there is lazy evaluation, just that it is non-strict), the behavior of your programs can be very unpredictable. Whether there is a memory leak could depend on whether a certain optimization is enabled. This is very different from imperative code, which has a fixed set of variables (usually) and defined evaluation order.
You've got a deadline. Although the pure style is almost always better practice and cleaner code, if you are used to writing imperatively and need the code soon, starting imperative and moving to functional later is a perfectly reasonable choice.
When I do use ST (and other monads), I try to follow these general guidelines:
Use Applicative style often. This makes the code easier to read and, if you do switch to an immutable version, much easier to convert. Not only that, but Applicative style is much more compact.
Don't just use ST. If you program only in ST, the result will be no better than a huge C program, possibly worse because of the explicit reads and writes. Instead, intersperse pure Haskell code where it applies. I often find myself using things like STRef s (Map k [v]). The map itself is being mutated, but much of the heavy lifting is done purely.
Don't remake libraries if you don't have to. A lot of code written for IO can be cleanly, and fairly mechanically, converted to ST. Replacing all the IORefs with STRefs and IOs with STs in Data.HashTable was much easier than writing a hand-coded hash table implementation would have been, and probably faster too.
One last note - if you are having trouble with the explicit reads and writes, there are ways around it.

Algorithms which make use of mutation and algorithms which do not are different algorithms. Sometimes there is a strightforward bounds-preserving translation from the former to the latter, sometimes a difficult one, and sometimes only one which does not preserve complexity bounds.
A skim of the paper reveals to me that I don't think it makes essential use of mutation -- and so I think a potentially really nifty lazy functional algorithm could be developed. But it would be a different but related algorithm to that described.
Below, I describe one such approach -- not necessarily the best or most clever, but pretty straightforward:
Here's the setup a I understand it -- A) a branching tree is constructed B) payoffs are then pushed back from the leafs to the root which then indicates the best choice at any given step. But this is expensive, so instead, only portions of the tree are explored to the leafs in a nondeterministic manner. Furthermore, each further exploration of the tree is determined by what's been learned in previous explorations.
So we build code to describe the "stage-wise" tree. Then, we have another data structure to define a partially explored tree along with partial reward estimates. We then have a function of randseed -> ptree -> ptree that given a random seed and a partially explored tree, embarks on one further exploration of the tree, updating the ptree structure as we go. Then, we can just iterate this function over an empty seed ptree to get a list of increasingly more sampled spaces in the ptree. We then can walk this list until some specified cutoff condition is met.
So now we've gone from one algorithm where everything is blended together to three distinct steps -- 1) building the whole state tree, lazily, 2) updating some partial exploration with some sampling of a structure and 3) deciding when we've gathered enough samples.

It's can be really difficult to tell when using ST is appropriate. I would suggest you do it with ST and without ST (not necessarily in that order). Keep the non-ST version simple; using ST should be seen as an optimization, and you don't want to do that until you know you need it.

I have to admit that I cannot read the Haskell code. But if you use ST for mutating the tree, then you can probably replace this with an immutable tree without losing much because:
Same complexity for mutable and immutable tree
You have to mutate every node above the new leaf. An immutable tree has to replace all nodes above the modified node. So in both cases the touched nodes are the same, thus you don't gain anything in complexity.
For e.g. Java object creation is more expensive than mutation, so maybe you can gain a bit here in Haskell by using mutation. But this I don't know for sure. But a small gain does not buy you much because of the next point.
Updating the tree is presumably not the bottleneck
The evaluation of the new leaf will probably be much more expensive than updating the tree. At least this is the case for UCT in computer Go.

Use of the ST monad is usually (but not always) as an optimization. For any optimization, I apply the same procedure:
Write the code without it,
Profile and identify bottlenecks,
Incrementally rewrite the bottlenecks and test for improvements/regressions,
The other use case I know of is as an alternative to the state monad. The key difference being that with the state monad the type of all of the data stored is specified in a top-down way, whereas with the ST monad it is specified bottom-up. There are cases where this is useful.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string