Managing a stateful computation system in Haskell - haskell

So, I have a system of stateful processors that are chained together. For example, a processor might output the average of its last 10 inputs. It requires state to calculate this average.
I would like to submit values to the system, and get the outputs. I also would like to jump back and restore the state at any time in the past. Ie. I run 1000 values through the system. Now I want to "move" the system back to exactly as it was after I had sent the 500th value through. Then I want to "replay" the system from that point again.
I also need to be able to persist the historical state to disk so I can restore the whole thing again some time in the future (and still have the move back and replay functions work). And of course, I need to do this with gigabytes of data, and have it be extremely fast :)
I had been approaching it using closures to hold state. But I'm wondering if it would make more sense to use a monad. I have only read through 3 or 4 analogies for monads so don't understand them well yet, so feel free to educate me.
If each processor modifies its state in the monad in such a way that its history is kept and it is tied to an id for each processing step. And then somehow the monad is able to switch its state to a past step id and run the system with the monad in that state. And the monad would have some mechanism for (de)serializing itself for storage.
(and given the size of the data... it really shouldn't even all be in memory, which would mean the monad would need to be mapped to disk, cached, etc...)
Is there an existing library/mechanism/approach/concept that has already been done to accomplish or assist in accomplishing what I'm trying to do?

So, I have a system of stateful processors that are chained together. For example, a processor might output the average of its last 10 inputs. It requires state to calculate this average.
First of all, it sounds like what you have are not just "stateful processors" but something like finite-state machines and/or transducers. This is probably a good place to start for research.
I would like to submit values to the system, and get the outputs. I also would like to jump back and restore the state at any time in the past. Ie. I run 1000 values through the system. Now I want to "move" the system back to exactly as it was after I had sent the 500th value through. Then I want to "replay" the system from that point again.
The simplest approach here, of course, is to simply keep a log of all prior states. But since it sounds like you have a great deal of data, the storage needed could easily become prohibitive. I would recommend thinking about how you might construct your processors in a way that could avoid this, e.g.:
If a processor's state can be reconstructed easily from the states of its neighbors a few steps prior, you can avoid logging it directly
If a processor is easily reversible in some situations, you don't need to log those immediately; rewinding can be calculated directly, and logging can be done as periodic snapshots
If you can nail a processor down to a very small number of states, make sure to do so.
If a processor behaves in very predictable ways on certain kinds of input, you can record that as such--e.g., if it idles on numeric input below some cutoff, rather than logging each value just log "idled for N steps".
I also need to be able to persist the historical state to disk so I can restore the whole thing again some time in the future (and still have the move back and replay functions work). And of course, I need to do this with gigabytes of data, and have it be extremely fast :)
Explicit state is your friend. Functions are a convenient way to represent active state machines, but they can't be serialized in any simple way. You want a clean separation of a (basically static) network of processors vs. a series of internal states used by each processor to calculate the next step.
Is there an existing library/mechanism/approach/concept that has already been done to accomplish what I'm trying to do? Does the monad approach make sense? Are there other better/special approaches that would help it do this efficiently especially given the enormous amount of data I have to manage?
If most of your processors resemble finite state transducers, and you need to have processors that take inputs of various types and produce different types of outputs, what you probably want is actually something with a structure based on Arrows, which gives you an abstraction for things that compose "like functions" in some sense, e.g., connecting the input of one processor to the output of another.
Furthermore, as long as you avoid the ArrowApply class and make sure that your state machine model only returns an output value and a new state, you'll be guaranteed to avoid implicit state because (unlike functions) Arrows aren't automatically higher-order.
Given the size of the data... it really shouldn't even all be in memory, which would mean the monad would need to be mapped to disk, cached, etc...
Given a static representation of your processor network, it shouldn't be too difficult to also provide an incremental input/output system that would read the data, serialize/deserialize the state, and write any output.
As a quick, rough starting point, here's an example of probably the simplest version of what I've outlined above, ignoring the logging issue for the moment:
data Transducer s a b = Transducer { curState :: s
, runStep :: s -> a -> (s, b)
}
runTransducer :: Transducer s a b -> [a] -> [b]
runTransducer t [] = (t, [])
runTransducer t (x:xs) = let (s, y) = runStep t (curState t) x
(t', ys) = runTransducer (t { curState = s }) xs
in (t', y:ys)
It's a simple, generic processor with explicit internal state of type s, input of type a, and output of type b. The runTransducer function shoves a list of inputs through, updating the state value manually, and collects a list of outputs.
P.S. -- since you were asking about monads, you might want to know if the example I gave is one. In fact, it's a combination of multiple common monads, though which ones depends on how you look at it. However, I've deliberately avoided treating it as a monad! The thing is, monads capture only abstractions that are in some sense very powerful, but that same power also makes them more resistant in some ways to optimization and static analysis. The main thing that needs to be ruled out is processors that take other processors as input and run them, which (as you can imagine) can create convoluted logic that's nearly impossible to analyze.
So, while the processors probably could be monads, and in some logical sense intrinsically are, it may be more useful to pretend that they aren't; imposing an artificial limitation in order to make static analysis simpler.

Related

Why does FRP consider time as a factor for values?

Behaviors are ubiquitously defined as “time-varying value”s1.
Why? time being the dependency/parameter for varying values is very uncommon.
My intuition for FRP would be to have behaviors as event-varying values instead; it is much more common, much more simple, I wage a much more of an efficient idea, and extensible enough to support time too (tick event).
For instance, if you write a counter, you don't care about time/associated timestamps, you just care about the "Increase-button clicked" and "Decrease-button clicked" events.
If you write a game and want a position/force behavior, you just care about the WASD/arrow keys held events, etc. (unless you ban your players for moving to the left in the afternoon; how iniquitous!).
So: Why time is a consideration at all? why timestamps? why are some libraries (e.g. reactive-banana, reactive) take it up to the extent of having Future, Moment values? Why work with event-streams instead of just responding to an event occurrence? All of this just seems to over-complicate a simple idea (event-varying/event-driven value); what's the gain? what problem are we solving here? (I'd love to also get a concrete example along with a wonderful explanation, if possible).
1 Behaviors have been defined so here, here, here... & pretty much everywhere I've encountered.
Behaviors differ from Events primarily in that a Behavior has a value right now while an Event only has a value whenever a new event comes in.
So what do we mean by "right now"? Technically all changes are implemented as push or pull semantics over event streams, so we can only possibly mean "the most recent value as of the last event of consequence for this Behavior". But that's a fairly hairy concept—in practice "now" is much simpler.
The reasoning for why "now" is simpler comes down to the API. Here are two examples from Reactive Banana.
Eventually an FRP system must always produce some kind of externally visible change. In Reactive Banana this is facilitated by the reactimate :: Event (IO ()) -> Moment () function which consumes event streams. There is no way to have a Behavior trigger external changes---you always have to do something like reactimate (someBehavior <# sampleTickEvent) to sample the behavior at concrete times.
Behaviors are Applicatives unlike Events. Why? Well, let's assume Event was an applicative and think about what happens when we have two event streams f and x and write f <*> x: since events occur all at different times the chances of f and x being defined simultaneously are (almost certainly) 0. So f <*> x would always mean the empty event stream which is useless.
What you really want is for f <*> x to cache the most current values for each and take their combined value "all of the time". That's really confusing concept to talk about in terms of an event stream, so instead lets consider f and x as taking values for all points in time. Now f <*> x is also defined as taking values for all points in time. We've just invented Behaviors.
Because it was the simplest way I could think of to give a precise denotation (implementation-independent meaning) to the notion of behaviors, including the sorts of operations I wanted, including differentiation and integration, as well as tracking one or more other behaviors (including but not limited to user-generated behavior).
Why? time being the dependency/parameter for varying values is very uncommon.
I suspect that you're confusing the construction (recipe) of a behavior with its meaning. For instance, a behavior might be constructed via a dependency on something like user input, possibly with additional synthetic transformation. So there's the recipe. The meaning, however, is simply a function of time, related to the time-function that is the user input. Note that by "function", I mean in the math sense of the word: a (deterministic) mapping from domain (time) to range (value), not in the sense that there's a purely programmatic description.
I've seen many questions asking why time matters and why continuous time. If you apply the simple discipline of giving a mathematical meaning in the style of denotational semantics (a simple and familiar style for functional programmers), the issues become much clearer.
If you really want to grok the essence of and thinking behind FRP, I recommend you read my answer to "Specification for a Functional Reactive Programming language" and follow pointers, including "What is Functional Reactive Programming".
Conal Elliott's Push-Pull FRP paper describes event-varying data, where the only points in time that are interesting are when events occcur. Reactive event-varying data is the current value and the next Event that will change it. An Event is a Future point in the event-varying Reactive data.
data Reactive a = a ‘Stepper ‘ Event a
newtype Event a = Ev (Future (Reactive a))
The Future doesn't need to have a time associated with it, it just need to represent the idea of a value that hasn't happened yet. In an impure language with events, for example, a future can be an event handle and a value. When the event occurs, you set the value and raise the handle.
Reactive a has a value for a at all points in time, so why would we need Behaviors? Let's make a simple game. In between when the user presses the WASD keys, the character, accelerated by the force applied, still moves on the screen. The character's position at different points in time is different, even though no event has occurred in the intervening time. This is what a Behavior describes - something that not only has a value at all points in time, but its value can be different at all points in time, even with no intervening events.
One way to describe Behaviors would be to repeat what we just stated. Behaviors are things that can change in-between events. In-between events they are time-varying values, or functions of time.
type Behavior a = Reactive (Time -> a)
We don't need Behavior, we could simply add events for clock ticks, and write all of the logic in our entire game in terms of these tick events. This is undesirable to some developers as the code declaring what our game is is now intermingled with the code providing how it is implemented. Behaviors allow the developer to separate this logic between the description of the game in terms of time-varying variables and the implementation of the engine that executes that description.

FRP - Event streams and Signals - what is lost in using just signals?

In recent implementations of Classic FRP, for instance reactive-banana, there are event streams and signals, which are step functions (reactive-banana calls them behaviours but they are nevertheless step functions). I've noticed that Elm only uses signals, and doesn't differentiate between signals and event streams. Also, reactive-banana allows to go from event streams to signals (edited: and it's sort of possible to act on behaviours using reactimate' although it not considered good practice), which kind of means that in theory we could apply all the event stream combinators on signals/behaviours by first converting the signal to event stream, applying and then converting again. So, given that it's in general easier to use and learn just one abstraction, what is the advantage of having separated signals and event streams ? Is anything lost in using just signals and converting all the event stream combinators to operate on signals ?
edit: The discussion has been very interesting. The main conclusions that I took from the discussion myself is that behaviours/event sources are both needed for mutually recursive definitions (feedback) and for having an output depend on two inputs (a behaviour and an event source) but only cause an action when one of them changes (<#>).
(Clarification: In reactive-banana, it is not possible to convert a Behavior back to an Event. The stepper function is a one-way ticket. There is a changes function, but its type indicates that it is "impure" and it comes with a warning that it does not preserve the semantics.)
I believe that having two separates concepts makes the API more elegant. In other words, it boils down to a question of API usability. I think that the two concepts behave sufficiently differently that things flow better if you have two separate types.
For example, the direct product for each type is different. A pair of Behavior is equivalent to a Behavior of pairs
(Behavior a, Behavior b) ~ Behavior (a,b)
whereas a pair of Events is equivalent to an Event of a direct sum:
(Event a, Event b) ~ Event (EitherOrBoth a b)
If you merge both types into one, then neither of these equivalence will hold anymore.
However, one of the main reasons for the separation of Event and Behavior is that the latter does not have a notion of changes or "updates". This may seem like an omission at first, but it is extremely useful in practice, because it leads to simpler code. For instance, consider a monadic function newInput that creates an input GUI widget that displays the text indicated in the argument Behavior,
input <- newInput (bText :: Behavior String)
The key point now is that the text displayed does not depend on how often the Behavior bText may have been updated (to the same or a different value), only on the actual value itself. This is a lot easier to reason about than the other case, where you would have to think about what happens when two successive event occurrences have the same value. Do you redraw the text while the user edits it?
(Of course, in order to actually draw the text, the library has to interface with the GUI framework and does keep track of changes in the Behavior. This is what the changes combinator is for. However, this can be seen as an optimization and is not available from "within FRP".)
The other main reason for the separation is recursion. Most Events that recursively depend on themselves are ill-defined. However, recursion is always allowed if you have mutual recursion between an Event and a Behavior
e = ((+) <$> b) <#> einput
b = stepper 0 e
There is no need to introduce delays by hand, it just works out of the box.
Something critically important to me is lost, namely the essence of behaviors, which is (possibly continuous) variation over continuous time.
Precise, simple, useful semantics (independent of a particular implementation or execution) is often lost as well.
Check out my answer to "Specification for a Functional Reactive Programming language", and follow the links there.
Whether in time or in space, premature discretization thwarts composability and complicates semantics.
Consider vector graphics (and other spatially continuous models like Pan's). Just as with premature finitization of data structures as explained in Why Functional Programming Matters.
I don't think there's any benefit to using the signals/behaviors abstraction over elm-style signals. As you point out, it's possible to create a signal-only API on top of the signal/behavior API (not at all ready for use, but see https://github.com/JohnLato/impulse/blob/dyn2/src/Reactive/Impulse/Syntax2.hs for an example). I'm pretty sure it's also possible to write a signal/behavior API on top of an elm-style API as well. That would make the two APIs functionally equivalent.
WRT efficiency, with a signals-only API the system should have a mechanism where only signals that have updated values will cause recomputations (e.g. if you don't move the mouse, the FRP network won't re-calculate the pointer coordinates and redraw the screen). Provided this is done, I don't think there's any loss of efficiency compared to a signals-and-streams approach. I'm pretty sure Elm works this way.
I don't think the continuous-behavior issue makes any difference here (or really at all). What people mean by saying behaviors are continuous over time is that they are defined at all times (i.e. they're functions over a continuous domain); the behavior itself isn't a continuous function. But we don't actually have a way to sample a behavior at any time; they can only be sampled at times corresponding to events, so we can't use the full power of this definition!
Semantically, starting from these definitions:
Event == for some t ∈ T: [(t,a)]
Behavior == ∀ t ∈ T: t -> b
since behaviors can only be sampled at times where events are defined, we can create a new domain TX where TX is the set of all times t at which Events are defined. Now we can loosen the Behavior definition to
Behavior == ∀ t ∈ TX: t -> b
without losing any power (i.e. this is equivalent to the original definition within the confines of our frp system). Now we can enumerate all times in TX to transform this to
Behavior == ∀ t ∈ TX: [(t,b)]
which is identical to the original Event definition except for the domain and quantification. Now we can change the domain of Event to TX (by the definition of TX), and the quantification of Behavior (from forall to for some) and we get
Event == for some t ∈ TX: [(t,a)]
Behavior == for some t ∈ TX: [(t,b)]
and now Event and Behavior are semantically identical, so they could obviously be represented using the same structure in an FRP system. We do lose a bit of information at this step; if we don't differentiate between Event and Behavior we don't know that a Behavior is defined at every time t, but in practice I don't think this really matters much. What elm does IIRC is require both Events and Behaviors to have values at all times and just use the previous value for an Event if it hasn't changed (i.e. change the quantification of Event to forall instead of changing the quantification of Behavior). This means you can treat everything as a signal and it all Just Works; it's just implemented so that the signal domain is exactly the subset of time that the system actually uses.
I think this idea was presented in a paper (which I can't find now, anyone else have a link?) about implementing FRP in Java, perhaps from POPL '14? Working from memory, so my outline isn't as rigorous as the original proof.
There's nothing to stop you from creating a more-defined Behavior by e.g. pure someFunction, this just means that within an FRP system you can't make use of that extra defined-ness, so nothing is lost by a more restricted implementation.
As for notional signals such as time, note that it's impossible to implement an actual continuous-time signal using typical programming languages. Since the implementation will necessarily be discrete, converting that to an event stream is trivial.
In short, I don't think anything is lost by using just signals.
I've unfortunately have no references in mind, but I distinctly remember
different reactive authors claiming this choice is just for efficiency. You expose
both to give the programmer a choice in what implementation of the same idea is
more efficient for your problem.
I might be lying now, but I believe Elm implements everything as event streams under the
hood. Things like time wouldn't be so nice like event streams though, since there are an
infinite amount of events during any time frame. I'm not sure how Elm solves this, but I
think it's a good example on something that makes more sense as a signal, both conceptually
and in implementation.

Do we care about the 'past' in FRP?

When toying around with implementing FRP one thing I've found that is confusing is what to
do with the past? Basically, my understanding was that I would be able to do this with a Behaviour at any point:
beh.at(x) // where time x < now
This seems like it could be problematic performance wise in a case such as this:
val beh = Stepper(0, event) // stepwise behaviour
Here we can see that to evaluate the Behaviour in the past we need to keep all the Events and we will end up performing (at worst) linear scans each time we sample.
Do we want this ability to be available or should Behaviours only be allowed to be evaluated at a time >= now? Do we even want to expose the at function to the programmer?
While a behaviour is considered to be a function of time, reliance on an arbitrary amount of past data in FRP is a Bad Thing, and is referred to as a time leak. That is, transformations on behaviours should generally be streaming/reactive in that they do not rely on more than a bounded amount of the past (and should accumulate this knowledge of the history explicitly).
So, no, at is not desirable in a real FRP system: it should not be possible to look at either the past or the future. (The latter is, of course, impossible, if the state of the future depends on anything external to the FRP system.)
Of course, this leads to the problem that only being able to look at the exact present severely restricts what you can do when writing a function to transform behaviours: Behaviour a -> Behaviour b becomes the same as a -> b, which makes many things we'd like to do impossible. But this is more an issue of finding a semantics, one of FRP's persistent problems, than anything else; as long as the primitive transformations on behaviours you provide are powerful enough without causing time leaks, everything should be fine. For more information on this, see Garbage collecting the semantics of FRP.

Is it possible to make fast big circular buffer arrays for stream recording in Haskell?

I'm considering converting a C# app to Haskell as my first "real" Haskell project. However I want to make sure it's a project that makes sense. The app collects data packets from ~15 serial streams that come at around 1 kHz, loads those values into the corresponding circular buffers on my "context" object, each with ~25000 elements, and then at 60 Hz sends those arrays out to OpenGL for waveform display. (Thus it has to be stored as an array, or at least converted to an array every 16 ms). There are also about 70 fields on my context object that I only maintain the current (latest) value, not the stream waveform.
There are several aspects of this project that map well to Haskell, but the thing I worry about is the performance. If for each new datapoint in any of the streams, I'm having to clone the entire context object with 70 fields and 15 25000-element arrays, obviously there's going to be performance issues.
Would I get around this by putting everything in the IO-monad? But then that seems to somewhat defeat the purpose of using Haskell, right? Also all my code in C# is event-driven; is there an idiom for that in Haskell? It seems like adding a listener creates a "side effect" and I'm not sure how exactly that would be done.
Look at this link, under the section "The ST monad":
http://book.realworldhaskell.org/read/advanced-library-design-building-a-bloom-filter.html
Back in the section called “Modifying array elements”, we mentioned
that modifying an immutable array is prohibitively expensive, as it
requires copying the entire array. Using a UArray does not change
this, so what can we do to reduce the cost to bearable levels?
In an imperative language, we would simply modify the elements of the
array in place; this will be our approach in Haskell, too.
Haskell provides a special monad, named ST, which lets us work
safely with mutable state. Compared to the State monad, it has some
powerful added capabilities.
We can thaw an immutable array to give a mutable array; modify the
mutable array in place; and freeze a new immutable array when we are
done.
...
The IO monad also provides these capabilities. The major difference between the two is that the ST monad is intentionally designed so that we can escape from it back into pure Haskell code.
So should be possible to modify in-place, and it won't defeat the purpose of using Haskell after all.
Yes, you would probably want to use the IO monad for mutable data. I don't believe the ST monad is a good fit for this problem space because the data updates are interleaved with actual IO actions (reading input streams). As you would need to perform the IO within ST by using unsafeIOToST, I find it preferable to just use IO directly. The other approach with ST is to continually thaw and freeze an array; this is messy because you need to guarantee that old copies of the data are never used.
Although evidence shows that a pure solution (in the form of Data.Sequence.Seq) is often faster than using mutable data, given your requirement that data be pushed out to OpenGL, you'll possible get better performance from working with the array directly. I would use the functions from Data.Vector.Storable.Mutable (from the vector package), as then you have access to the ForeignPtr for export.
You can look at arrows (Yampa) for one very common approach to event-driven code. Another area is Functional Reactivity (FRP). There are starting to be some reasonably mature libraries in this domain, such as Netwire or reactive-banana. I don't know if they'd provide adequate performance for your requirements though; I've mostly used them for gui-type programming.

Keeping State in a Purely Functional Language

I am trying to figure out how to do the following, assume that your are working on a controller for a DC motor you want to keep it spinning at a certain speed set by the user,
(def set-point (ref {:sp 90}))
(while true
(let [curr (read-speed)]
(controller #set-point curr)))
Now that set-point can change any time via a web a application, I can't think of a way to do this without using ref, so my question is how functional languages deal with this sort of thing? (even though the example is in clojure I am interested in the general idea.)
This will not answer your question but I want to show how these things are done in Clojure. It might help someone reading this later so they don't think they have to read up on monads, reactive programming or other "complicated" subjects to use Clojure.
Clojure is not a purely functional language and in this case it might be a good idea to leave the pure functions aside for a moment and model the inherent state of the system with identities.
In Clojure, you would probably use one of the reference types. There are several to choose from and knowing which one to use might be difficult. The good news is they all support the unified update model so changing the reference type later should be pretty straight forward.
I've chosen an atom but depending on your requirements it might be more appropriate to use a ref or an agent.
The motor is an identity in your program. It is a "label" for some thing that has different values at different times and these values are related to each other (i.e., the speed of the motor). I have put a :validator on the atom to ensure that the speed never drops below zero.
(def motor (atom {:speed 0} :validator (comp not neg? :speed)))
(defn add-speed [n]
(swap! motor update-in [:speed] + n))
(defn set-speed [n]
(swap! motor update-in [:speed] (constantly n)))
> (add-speed 10)
> (add-speed -8)
> (add-speed -4) ;; This will not change the state of motor
;; since the speed would drop below zero and
;; the validator does not allow that!
> (:speed #motor)
2
> (set-speed 12)
> (:speed #motor)
12
If you want to change the semantics of the motor identity you have at least two other reference types to choose from.
If you want to change the speed of the motor asynchronously you would use an agent. Then you need to change swap! with send. This would be useful if, for example, the clients adjusting the motor speed are different from the clients using the motor speed, so that it's fine for the speed to be changed "eventually".
Another option is to use a ref which would be appropriate if the motor need to coordinate with other identities in your system. If you choose this reference type you change swap! with alter. In addition, all state changes are run in a transaction with dosync to ensure that all identities in the transaction are updated atomically.
Monads are not needed to model identities and state in Clojure!
For this answer, I'm going to interpret "a purely functional language" as meaning "an ML-style language that excludes side effects" which I will interpret in turn as meaning "Haskell" which I'll interpret as meaning "GHC". None of these are strictly true, but given that you're contrasting this with a Lisp derivative and that GHC is rather prominent, I'm guessing this will still get at the heart of your question.
As always, the answer in Haskell is a bit of sleight-of-hand where access to mutable data (or anything with side effects) is structured in such a way that the type system guarantees that it will "look" pure from the inside, while producing a final program that has side effects where expected. The usual business with monads is a large part of this, but the details don't really matter and mostly distract from the issue. In practice, it just means you have to be explicit about where side effects can occur and in what order, and you're not allowed to "cheat".
Mutability primitives are generally provided by the language runtime, and accessed through functions that produce values in some monad also provided by the runtime (often IO, sometimes more specialized ones). First, let's take a look at the Clojure example you provided: it uses ref, which is described in the documentation here:
While Vars ensure safe use of mutable storage locations via thread isolation, transactional references (Refs) ensure safe shared use of mutable storage locations via a software transactional memory (STM) system. Refs are bound to a single storage location for their lifetime, and only allow mutation of that location to occur within a transaction.
Amusingly, that whole paragraph translates pretty directly to GHC Haskell. I'm guessing that "Vars" are equivalent to Haskell's MVar, while "Refs" are almost certainly equivalent to TVar as found in the stm package.
So to translate the example to Haskell, we'll need a function that creates the TVar:
setPoint :: STM (TVar Int)
setPoint = newTVar 90
...and we can use it in code like this:
updateLoop :: IO ()
updateLoop = do tvSetPoint <- atomically setPoint
sequence_ . repeat $ update tvSetPoint
where update tv = do curSpeed <- readSpeed
curSet <- atomically $ readTVar tv
controller curSet curSpeed
In actual use my code would be far more terse than that, but I've left things more verbose here in hopes of being less cryptic.
I suppose one could object that this code isn't pure and is using mutable state, but... so what? At some point a program is going to run and we'd like it to do input and output. The important thing is that we retain all the benefits of code being pure, even when using it to write code with mutable state. For instance, I've implemented an infinite loop of side effects using the repeat function; but repeat is still pure and behaves reliably and nothing I can do with it will change that.
A technique to tackle problems that apparently scream for mutability (like GUI or web applications) in a functional way is Functional Reactive Programming.
The pattern you need for this is called Monads. If you really want to get into functional programming you should try to understand what monads are used for and what they can do. As a starting point I would suggest this link.
As a short informal explanation for monads:
Monads can be seen as data + context that is passed around in your program. This is the "space suit" often used in explanations. You pass data and context around together and insert any operation into this Monad. There is usually no way to get the data back once it is inserted into the context, you just can go the other way round inserting operations, so that they handle data combined with context. This way it almost seems as if you get the data out, but if you look closely you never do.
Depending on your application the context can be almost anything. A datastructure that combines multiple entities, exceptions, optionals, or the real world (i/o-monads). In the paper linked above the context will be execution states of an algorithm, so this is quite similar to the things you have in mind.
In Erlang you could use a process to hold the value. Something like this:
holdVar(SomeVar) ->
receive %% wait for message
{From, get} -> %% if you receive a get
From ! {value, SomeVar}, %% respond with SomeVar
holdVar(SomeVar); %% recursively call holdVar
%% to start listening again
{From, {set, SomeNewVar}} -> %% if you receive a set
From ! {ok}, %% respond with ok
holdVar(SomeNewVar); %% recursively call holdVar with
%% the SomeNewVar that you received
%% in the message
end.

Resources