Generating a unique value in Haskell do-notation

Generating a unique value in Haskell do-notation - haskell

To generate x86 assembly code, I have defined a custom type called X86:
data X86 a = X86 { code :: String, counter :: Integer, value :: (X86 a -> a) }
This type is used in do-notation like the following. This makes it easy to write templates for generating if-statements, for-loops, etc...
generateCode :: X86 ()
generateCode = do
label1 <- allocateUniqueLabel
label2 <- allocateUniqueLabel
jmp label1
label label1
jmp label2
label label2
Instructions are defined like this:
jmp :: String -> X86 ()
jmp l = X86 { code = "jmp " ++ l ++ ";\n", counter = 0, value = const () }
label :: String -> X86 ()
label l = X86 { code = l ++ ":\n", counter = 0, value = const () }
And the completed assembly file is printed like so:
printAsm :: X86 a -> String
printAsm X86{code=code} = code
main = do
putStrLn (printAsm generateCode)
I implemented the X86 monad in the following manner. Essentially, the sequence operator concatenates blocks of assembly code in order and ensures the counters are incremented.
instance Monad X86 where
x >> y = X86 { code = code x ++ code y, counter = counter x + counter y, value = value y }
x >>= f = x >> y
where y = f (value x x)
The problem is the labels are not incremented properly, so they are not unique! The following is the output:
jmp Label1;
Label1:
jmp Label1;
Label1:
I desire the output to have a unique value for each label:
jmp Label1;
Label1:
jmp Label2;
Label2:
To complete the example, here is the implementation of the allocatedUniqueLabel function:
allocateUniqueId :: X86 Integer
allocateUniqueId = X86 { code = "", counter = 1, value = counter }
allocateUniqueLabel :: X86 String
allocateUniqueLabel = do
id <- allocateUniqueId
return ("Label" ++ show id)
How can I fix my X86 monad so the labels are unique?
Here is what I've tried:
Incrementing a global counter. => Haskell does not safely allow global state outside the IO monad.
Using the State monad. => I have looked into a number of examples, but do not understand how to integrate them into my existing X86 monad.
Keep track of the counter outside of the monad. => I rather the counter is updated "behind the scenes"; otherwise, a lot of code templates which do not use labels will need to propagate the counter manually.

We can use mtl classes to describe X86 code as effectful programs. We want:
to generate code, this is a Writer effect;
to maintain a counter, this is a State effect.
We worry about instantiating these effects last, and in the description of the programs we use MonadWriter and MonadState constraints.
import Control.Monad.State -- mtl
import Control.Monad.Writer
Allocating a new identifier increments the counter, without generating any code. This only uses the State effect.
type Id = Integer
allocateUniqueLabel :: MonadState Id m => m String
allocateUniqueLabel = do
i <- get
put (i+1) -- increment
return ("Label" ++ show (i+1))
And of course, we have actions to generate code, that don't need to care about the current state. So they use the Writer effect.
jmp :: MonadWriter String m => String -> m ()
jmp l = tell ("jmp " ++ l ++ ";\n")
label :: MonadWriter String m => String -> m ()
label l = tell (l ++ ":\n")
The actual program looks the same as the original, but with more general types.
generateCode :: (MonadState Id m, MonadWriter String m) => m ()
generateCode = do
label1 <- allocateUniqueLabel
label2 <- allocateUniqueLabel
jmp label1
label label1
jmp label2
label label2
The effects are instantiated when we run this program, here using runWriterT/runWriter and runStateT/runState (the order doesn't matter much, these two effects commute).
type X86 = WriterT String (State Id)
runX86 :: X86 () -> String
runX86 gen = evalState (execWriterT gen) 1 -- start counting from 1
-- evalState and execWriterT are wrappers around `runStateT` and `runWriterT`:
-- - execWriterT: discards the result (of type ()), only keeping the generated code.
-- - evalState: discards the final state, only keeping the generated code,
-- and does some unwrapping after there are no effects to handle.

You probably want to use this monad stack:
type X86 a = StateT Integer (Writer String) a
Since you have a state and a writer, you could also consider using RWS (reader-writer-state all in one):
type X86 a = RWS () String Integer a
Let's pick the first one for fun. I'd first define a helper function to increment the counter (monads cannot lawfully increment a counter "automatically"):
instr :: X86 a -> X86 a
instr i = do
x <- i
modify (+1)
return x
Then you could define jmp as:
jmp :: String -> X86 ()
jmp l = instr $ do
lift (tell ("jmp " ++ l ++ ";\n"))
-- 'tell' is one of Writer's operations, and then we 'lift'
-- it into StateT
(The do there is superfluous, however I suspect there will be a pattern of starting instruction definitions with instr $ do)
I would not roll my own monad for this -- it can be instructive to do so, but I think you'll get more mileage using the standard libraries for this one.

As you probably now underestand from the other answers, the problem with your
approach was that even though you were using the counter, you were still
generating your labels locally. In particular
label1 <- allocateUniqueLabel
label label1
was equivalent to
X86 { code = "Label1:\n", counter = 1, value = const () }
We need to assemble the whole code first, generate the labels, and only
afterwards (in some sense) generate the actual code using the labels.
And this is what the other answers are suggesting by having the counter stored
in the State (or RWS) monad.
There is yet another issue that we can address: You want to be able to jump both
forwards and backwards. This is most likely why you have separate
allocateUniqueLabel and label functions. But this allows to set the same
label twice.
It is actually possible to use to do notation with "backwards" binding using
MonadFix,
which defines this monadic operation:
mfix :: (a -> m a) -> m a
Since both State and RWS have MonadFix instances, we can indeed write code
like this:
{-# LANGUAGE GeneralizedNewtypeDeriving, RecursiveDo #-}
module X86
( X86()
, runX86
, label
, jmp
) where
import Control.Monad.RWS
-- In production code it'll be much faster if we replace String with
-- ByteString.
newtype X86 a = X86 (RWS () String Int a)
deriving (Functor, Applicative, Monad, MonadFix)
runX86 :: X86 a -> String
runX86 (X86 k) = snd (execRWS k () 1)
newtype Label = Label { getLabel :: String }
label :: X86 Label
label = X86 $ do
counter <- get
let l = "Label" ++ show counter
tell (l ++ ":\n")
modify (+1)
return (Label l)
jmp :: Label -> X86 ()
jmp (Label l) = X86 . tell $ "jmp " ++ l ++ ";\n"
And use it like this:
example :: X86 ()
example = do
rec l1 <- label
jmp l2
l2 <- label
jmp l1
There are a few things to note:
We need to use the RecursiveDo extension to enable the rec keyword.
Keyword rec delimits a block of mutually recursive definitions. In our case
it could also start one line later (rec jmp l2). GHC then translates it into
using mfix internally. (Using the deprecated mdo keyword instead of rec
would make the code somewhat more natural.)
We wrap the internals in the X86 newtype. First it's always good to hide the
internal implementation, it allows easy refactorings later. Second, mfix
requires that the function passed to it a -> m a isn't strict in its
argument. The effect must not depend on the argument, otherwise mfix
diverges. This is condition is satisfied for our functions, but if the
internals are exposed, someone could define a contrived function like this:
-- | Reset the counter to the specified label.
evilReset :: Label -> X86 ()
evilReset = X86 . put . read . drop 5 . getLabel
Not only it breaks the uniqueness of labels, but also causes the following code
to hang:
diverge :: X86 ()
diverge = do
rec evilReset l2
l2 <- label
return ()
Another quite similar alternative would be to use the
Rand
monad and generate labels with the
Random
instance of
UUID.
Something like WriterT String Rand a, which also has a MonadFix instance.
(From purely academic perspective it might be possible to construct an arrow instead of a monad, that'd implement
ArrowLoop,
but disallow state modifications that depend on values, such as in evilReset. But encapsulation of X86 achieves the same goal, keeping the much friendlier do syntax.)

Related

How do I get rid of this memory leak in Haskell?

I'm trying to find out why the following code has a memory leak:
module Main where
import System.IO
func :: Int -> Int -> ([Int], Int)
func input 0 = ([], input)
func input numTimes = do
let (rest, ret) = func (input + 1) (numTimes - 1)
((input : rest), ret)
main :: IO ()
main = do
hSetBuffering stdout LineBuffering
let (list, final) = func 0 10000000000
listStr = map (\x -> (show x) ++ "\n") list
putStr (foldr (++) "" listStr)
putStr (show final)
printStrs :: [String] -> String -> IO ()
printStrs [] str = do
putStrLn str
printStrs (first : rest) str = do
putStr first
printStrs rest str
When I compile it with ghc --make Main and run it, the top command shows it eating up more and more memory even though the amount of memory it uses should be constant because of lazy evaluation. I've tried using the printStrs function I wrote instead, and it still eats up all memory. I've tried using ghci on the code and using :sprint to print out the thunks from func and it seems like the thunks aren't increasing the amount of memory used for each evaluation of an element in the list.
I honestly don't know what else to do.

The problem is that func will build a huge list and laziness will not be able to avoid it. It reminds me of continuation passing where the order of computations are sequentialized.
I think, the part with foldr is responsible for the memory consumption. By avoiding it and compiling it with ghc -O3, the memory usage is constant in my test:
module Main where
import System.IO
func :: Int -> Int -> ([Int], Int)
func input 0 = ([], input)
func input numTimes = do
let (rest, ret) = func (input + 1) (numTimes - 1)
((input : rest), ret)
main :: IO ()
main = do
hSetBuffering stdout LineBuffering
let (list, final) = func 0 10000000000
mapM_ (putStrLn . show) list
putStr (show final)
In ghci, it still blows the memory. But it might be because the interpreter is not able to optimize the recursion away.

You mention in a comment that
I just want to get intermediate values so I can have some idea of how much progress the program is making and then take the final value as a separate return value
Let's try defining a special-purpose datatype which models the idea of "inspect some bit of progress, or get hold of the final result, if we have finished". Something like
{-# LANGUAGE DeriveFunctor #-}
data Progress a r = Emit a (Progress a r) | Result r
deriving Functor -- maps over the result value r, not over the as
Notice that, unlike ([Int], Int), Progress doesn't give us "direct" access to the final result until we have gone trough all the nested Emit constructors. Hopefully this will help us avoid unexpected dependencies between thunks.
Now let's define func like this:
{-# LANGUAGE BangPatterns #-}
func :: Int -> Int -> Progress Int Int
func input 0 =
Result input
-- the bang avoids the accumulation of thunks behind the input param
func !input numTimes =
Emit input (func (input + 1) (numTimes - 1))
Notice that we don't need to go through all the recursive calls to get hold of the first progress "notification". Even if input is 10000000000, we can pattern-match on the outermost Emit constructor after the first iteration!
The disadvantage of the Progress a r datatype is that we can't easily use regular list functions to print the progress. But we can define our own:
printProgress :: Show a => Progress a r -> IO r
printProgress (Result r) =
pure r
printProgress (Emit a rest) =
do print a
printProgress rest
In practice, we often also want to be able to perform monadic effects at each "step". At that point, it's common to turn to some streaming library like streaming. If you squint a little, the Stream type from "streaming" and similar libraries is basically an effectful list which returns a special result after reaching the end.

Haskell - Monad Transformers - Limit number of evaluations in an interpreter

I'm learning Monad Transformers and decided to write an interpreter for a simple language(with loop constructs) similar to Brainfuck using Monad Transformers. I would like to terminate the interpreter after certain number of statements.
This simple language is made of single memory cell capable of holding an Int and 5 instructions Input, Output, Increment, Decrement and Loop. A loop terminates when value in the memory is zero. Input is read from a list and similarly output is written to another list. Increment and Decrement does +1 and -1 to memory correspondingly.
I'm using World type to keep track of input, output (streams) and memory, Sum Int to count number of instructions evaluated. Except World to terminate evaluation after certain statements.
module Transformers where
import qualified Data.Map as Map
import Data.Maybe
import Control.Monad.State.Lazy
import Control.Monad.Writer.Lazy
import Control.Monad.Except
data Term = Input
| Output
| Increment
| Decrement
| Loop [Term]
deriving (Show)
data World = World {
inp :: [Int],
out :: [Int],
mem :: Int
} deriving Show
op_limit = 5
loop
:: [Term]
-> StateT World (WriterT (Sum Int) (Except World)) ()
-> StateT World (WriterT (Sum Int) (Except World)) ()
loop terms sp = sp >> do
s <- get
if mem s == 0 then put s else loop terms (foldM (\_ t -> eval t) () terms)
limit :: StateT World (WriterT (Sum Int) (Except World)) ()
limit = do
(s, count) <- listen get
when (count >= op_limit) $ throwError s
tick :: StateT World (WriterT (Sum Int) (Except World)) ()
tick = tell 1
eval :: Term -> StateT World (WriterT (Sum Int) (Except World)) ()
eval Input =
limit >> tick >> modify (\s -> s { inp = tail (inp s), mem = head (inp s) })
eval Output = limit >> tick >> modify (\s -> s { out = mem s : out s })
eval Increment = limit >> tick >> modify (\s -> s { mem = mem s + 1 })
eval Decrement = limit >> tick >> modify (\s -> s { mem = mem s - 1 })
eval (Loop terms) = loop terms (void get)
type Instructions = [Term]
interp :: Instructions -> World -> Either World (World, Sum Int)
interp insts w =
let sp = foldM (\_ inst -> eval inst) () insts
in runExcept (runWriterT (execStateT sp w))
Example run in ghci:
*Transformers> interp [Loop [Output, Decrement]] $ World [] [] 5
Right (World {inp = [], out = [1,2,3,4,5], mem = 0},Sum {getSum = 10})
The monad limit based on count and should decide to either Fail with current state or do nothing. But I noticed that count in (s, count) <- listen get is always zero. I don't understand why is this happening. Please help me understand where I went wrong.
Is my ordering of transformers in the stack correct? Are there any rules (informal) to decide the layering?

Computations inside the Writer monad can't have access to their own accumulator. What's more: the accumulator is never forced while the computation runs, not even to WHNF. This applies to both the strict and lazy variants of Writer—the strict variant is strict in a sense unrelated to the accumulator. This unavoidable laziness in the accumulator can be a source of space leaks if the computation runs for too long.
Your limit function is not branching on the value of the "mainline" WriterT accumulator. The get action (you are using mtl) simply reads the state from the StateT layer, and performs no effects in the other layers: it adds mempty to its WriterT accumulator an throws no error.
Then, the listen extracts the Writer accumulator of the get action (only of the get, not of the whole computation) and adds it to the "mainline" accumulator. But this extracted value (the one returned in the tuple) will always be mempty, that is, Sum 0!
Instead of WriterT, you could put the counter in the StateT state, as #chi has mentioned. You could also use AccumT, which is very similar to WriterT but lets you inspect the accumulator (it also lets you force it to WHNF using bang patterns).
AccumT doesn't seem to have a corresponding mtl typeclass though, so you'll need to sprinkle a few lifts in order to use it.

Get value from IO rather than the computation itself

Being quite new to Haskell, I'm currently trying to improve my skills by writing an interpreter for a simple imperative toy language.
One of the expressions in this language is input, which reads a single integer from standard input. However, when I assign the value of this expression to a variable and then use this variable later, it seems ot me that I actually stored the computation of reading a value rather the read value itself. This means that e.g. the statements
x = input;
y = x + x;
will cause the interpreter to invoke the input procedure three times rather than one.
Internally in the evaluator module, I use a Map to store the values of variables. Because I need to deal with IO, this gets wrapped in an IO monad, as immortalized in the following minimal example:
import qualified Data.Map as Map
type State = Map.Map String Int
type Op = Int -> Int -> Int
input :: String -> IO State -> IO State
input x state = do line <- getLine
st <- state
return $ Map.insert x (read line) st
get :: String -> IO State -> IO Int
get x state = do st <- state
return $ case Map.lookup x st of
Just i -> i
eval :: String -> Op -> String -> IO State -> IO Int
eval l op r state = do i <- get l state
j <- get r state
return $ op i j
main :: IO ()
main = do let state = return Map.empty
let state' = input "x" state
val <- eval "x" (+) "x" state'
putStrLn . show $ val
The second line in the main function simulates the assignment of x, while the third line simulates the evaluation of the binary + operator.
My question is: How do I get around this, such that the code above only inputs once? I suspect that it is the IO-wrapping that causes the problem, but as we're dealing with IO I see no way out of that..?

Remember that IO State is not an actual state, but instead the specification for an IO machine which eventually produces a State. Let's consider input as an IO-machine transformer
input :: String -> IO State -> IO State
input x state = do line <- getLine
st <- state
return $ Map.insert x (read line) st
Here, provided a machine for producing a state, we create a bigger machine which takes that passed state and adding a read from an input line. Again, to be clear, input name st is an IO-machine which is a slight modification of the IO-machine st.
Let's now examine get
get :: String -> IO State -> IO Int
get x state = do st <- state
return $ case Map.lookup x st of
Just i -> i
Here we have another IO-machine transformer. Given a name and an IO-machine which produces a State, get will produce an IO-machine which returns a number. Note again that get name st is fixed to always use the state produced by the (fixed, input) IO-machine st.
Let's combine these pieces in eval
eval :: String -> Op -> String -> IO State -> IO Int
eval l op r state = do i <- get l state
j <- get r state
return $ op i j
Here we call get l and get r each on the same IO-machine state and thus produce two (completely independent) IO-machines get l state and get r state. We then evaluate their IO effects one after another and return the op-combination of their results.
Let's examine the kinds of IO-machines built in main. In the first line we produce a trivial IO-machine, called state, written return Map.empty. This IO-machine, each time it's run, performs no side effects in order to return a fresh, blank Map.Map.
In the second line, we produce a new kind of IO-machine called state'. This IO-machine is based off of the state IO-machine, but it also requests an input line. Thus, to be clear, each time state' runs, a fresh Map.Map is generated and then an input line is read to read some Int, stored at "x".
It should be clear where this is going, but now when we examine the third line we see that we pass state', the IO-machine, into eval. Previously we stated that eval runs its input IO-machine twice, once for each name, and then combines the results. By this point it should be clear what's happening.
All together, we build a certain kind of machine which draws input and reads it as an integer, assigning it to a name in a blank Map.Map. We then build this IO-machine into a larger one which uses the first IO-machine twice, in two separate invocations, in order to collect data and combine it with an Op.
Finally, we run this eval machine using do notation (the (<-) arrow indicates running the machine). Clearly it should collect two separate lines.
So what do we really want to do? Well, we need to simulate ambient state in the IO monad, not just pass around Map.Maps. This is easy to do by using an IORef.
import Data.IORef
input :: IORef State -> String -> IO ()
input ref name = do
line <- getLine
modifyIORef ref (Map.insert name (read line))
eval :: IORef State -> Op -> String -> String -> IO Int
eval ref op l r = do
stateSnapshot <- readIORef ref
let Just i = Map.lookup l stateSnapshot
Just j = Map.lookup l stateSnapshot
return (op i j)
main = do
st <- newIORef Map.empty -- create a blank state, embedded into IO, not a value
input st "x" -- request input *once*
val <- eval st (+) "x" "x" -- compute the op
putStrLn . show $ val

It's fine to wrap your actions such as getLine in IO, but to me it looks like your problem is that you're trying to pass your state in the IO monad. Instead, I think this is probably time you get introduced to monad transformers and how they'll let you layer the IO and State monads to get the functionality of both in one.
Monad transformers are a pretty complex topic and it'll take a while to get to where you're comfortable with them (I'm still learning new things all the time about them), but they're a very useful tool when you need to layer multiple monads. You'll need the mtl library to follow this example.
First, imports
import qualified Data.Map as Map
import Control.Monad.State
Then types
type Op = Int -> Int -> Int
-- Renamed to not conflict with Control.Monad.State.State
type AppState = Map.Map String Int
type Interpreter a = StateT AppState IO a
Here Interpreter is the Monad in which we'll build our interpreter. We also need a way to run the interpreter
-- A utility function for kicking off an interpreter
runInterpreter :: Interpreter a -> IO a
runInterpreter interp = evalStateT interp Map.empty
I figured defaulting to Map.empty was sufficient.
Now, we can build our interpreter actions in our new monad. First we start with input. Instead of returning our new state, we just modify what is current in our map:
input :: String -> Interpreter ()
input x = do
-- IO actions have to be passed to liftIO
line <- liftIO getLine
-- modify is a member of the MonadState typeclass, which StateT implements
modify (Map.insert x (read line))
I had to rename get so that it didn't conflict with get from Control.Monad.State, but it does basically the same thing as before, it just takes our map and looks up that variable in it.
-- Had to rename to not conflict with Control.Monad.State.get
-- Also returns Maybe Int because it's safer
getVar :: String -> Interpreter (Maybe Int)
getVar x = do
-- get is a member of MonadState
vars <- get
return $ Map.lookup x vars
-- or
-- get x = fmap (Map.lookup x) get
Next, eval now just looks up each variable in our map, then uses liftM2 to keep the return value as Maybe Int. I prefer the safety of Maybe, but you can rewrite it if you prefer
eval :: String -> Op -> String -> Interpreter (Maybe Int)
eval l op r = do
i <- getVar l
j <- getVar r
-- liftM2 op :: Maybe Int -> Maybe Int -> Maybe Int
return $ liftM2 op i j
Finally, we write our sample program. It stores user input to the variable "x", adds it to itself, and prints out the result.
-- Now we can write our actions in our own monad
program :: Interpreter ()
program = do
input "x"
y <- eval "x" (+) "x"
case y of
Just y' -> liftIO $ putStrLn $ "y = " ++ show y'
Nothing -> liftIO $ putStrLn "Error!"
-- main is kept very simple
main :: IO ()
main = runInterpreter program
The basic idea is that there is a "base" monad, here IO, and these actions are "lifted" up to the "parent" monad, here StateT AppState. There is a typeclass implementation for the different state operations get, put, and modify in the MonadState typeclass, which StateT implements, and in order to lift IO actions there's a pre-made liftIO function that "lifts" IO actions to the parent monad. Now we don't have to worry about passing around our state explicitly, we can still perform IO, and it has even simplified the code!
I would recommend reading the Real World Haskell chapter on monad transformers to get a better feel for them. There are other useful ones as well, such as ErrorT for handling errors, ReaderT for static configuration, WriterT for aggregating results (usually used for logging), and many others. These can be layered into what is called a transformer stack, and it's not too difficult to make your own either.

Instead of passing an IO State, you can pass State and then use higher-level functions to deal with IO. You can go further and make get and eval free from side-effects:
input :: String -> State -> IO State
input x state = do
line <- getLine
return $ Map.insert x (read line) state
get :: String -> State -> Int
get x state = case Map.lookup x state of
Just i -> i
eval :: String -> Op -> String -> State -> Int
eval l op r state = let i = get l state
j = get r state
in op i j
main :: IO ()
main = do
let state = Map.empty
state' <- input "x" state
let val = eval "x" (+) "x" state'
putStrLn . show $ val

If you're actually building an interpreter, you'll presumably have a list of instructions to execute at some point.
This is my rough translation of your code (although I'm only a beginner myself)
import Data.Map (Map, empty, insert, (!))
import Control.Monad (foldM)
type ValMap = Map String Int
instrRead :: String -> ValMap -> IO ValMap
instrRead varname mem = do
putStr "Enter an int: "
line <- getLine
let intval = (read line)::Int
return $ insert varname intval mem
instrAdd :: String -> String -> String -> ValMap -> IO ValMap
instrAdd varname l r mem = do
return $ insert varname result mem
where result = (mem ! l) + (mem ! r)
apply :: ValMap -> (ValMap -> IO ValMap) -> IO ValMap
apply mem instr = instr mem
main = do
let mem0 = empty
let instructions = [ instrRead "x", instrAdd "y" "x" "x" ]
final <- foldM apply mem0 instructions
print (final ! "y")
putStrLn "done"
The foldM applies a function (apply) to a start value (mem0) and a list (instructions) but does so within a monad.

Densely packed tree of signals

I collect realtime signals, compute derived signals and store both raw and derived data
in a circular buffer, so I hold only last million of samples.
Sometimes I need to serialize current values for all signals. So I need something like:
type D0 a = M.Map SignalType D1
data D1 a = D1
{ foo :: M.Map DoorType a
, bar :: D2 a
, baz :: a
}
data D2 = D2
{
quux :: a
, zoo :: a
}
data MyData = D0 SignalBuffer
data CurrentSignals = D0 SignalValue
SignalBuffer is a sequence of SignalValue. It can be an unboxed array of floats. Haskell can derive Functor instances for me, so I can use fmap to fetch last SignalValue from every SignalBuffer and pass the structure to Aeson to serialize.
How do I implement a circular buffer API for SignalBuffer so I can push new values to all the buffers when new ticks arrive? I'd like to conserve memory, so I think I have to use unboxed arrays. Is it advantageous to use mutable unboxed arrays (STUArray?) so array updates don't pile up in memory? Is it possible to use mutable arrays in this setting at all? I'm ready to change MyData and CurrentSignals to whatever does the job.
I know how to implement circular buffers, the question is how to elegantly apply the updates to MyData.
I'm thinking of something like
type UpdateFunc a = MyData -> SignalValue -> Modifier SignalBuffer
updateAllBuffers :: D0 UpdateFunc -> Modifier MyData
Some signals are "convolutions" of other signals (not real convolutions, but a similar kind of processing). To update a buffer for a signal I need to access buffers of other signals - that's why UpdateFunc accepts MyData and SignalValue and returns a buffer modification function.
updateAllBuffers then "zips" D0 UpdateFunc and MyData to get new MyData.
Of course I'm ready to use whatever Modifier fits my task - it can be a function, a monadic value etc.

I do not entirely understand what you are trying to do with the code above, but you can use IOVector from Data.Vector.Unboxed.Mutable for a high-performance array to make a circular buffer:
{-# LANGUAGE GADTs #-}
import Data.IORef (IORef, newIORef, readIORef, writeIORef)
import Data.Vector.Unboxed.Mutable (IOVector, Unbox)
import qualified Data.Vector.Unboxed.Mutable as V
data CircularBuffer a where
CircularBuffer :: Unbox a =>
{ start :: IORef Int -- index for getting an element
, end :: IORef Int -- index for putting an element
, array :: IOVector a
} -> CircularBuffer a
newCircularBuffer :: (Unbox a) => Int -> IO (CircularBuffer a)
newCircularBuffer size = CircularBuffer <$> newIORef 0 <*> newIORef 0 <*> V.new size
putCircularBuffer :: Unbox a => a -> CircularBuffer a -> IO ()
putCircularBuffer newEndValue (CircularBuffer _start end array) = do
endIndex <- readIORef end
V.write array endIndex newEndValue
writeIORef end $! (endIndex + 1) `mod` V.length array
getCircularBuffer :: Unbox a => CircularBuffer a -> IO a
getCircularBuffer (CircularBuffer start _end array) = do
startIndex <- readIORef start
startValue <- V.read array startIndex
writeIORef start $! (startIndex + 1) `mod` V.length array
pure startValue
You can then make a map like function (it would be IO though) that applies a function to every item in the CircularBuffer's array.

Code generation for compiler in Haskell

I am writing a compiler for a small imperative language. The target language is Java bytecode, and the compiler is implemented in Haskell.
I've written a frontend for the language - i.e I have a lexer, parser and typechecker. I'm having trouble figuring out how to do code generation.
I keep a data structure representing the stack of local variables. I can query this structure with the name of a local variable and get its position in the stack. This data structure is passed around as I walk the syntax tree, and variables are popped and pushed as I enter and exit new scopes.
What I having trouble figuring out is how to emit the bytecode. Emitting strings at terminals and concatenating them at higher levels seems like a poor solution, both clarity- and performance-wise.
tl;dr How do I emit bytecode while waling the syntax tree?

My first project in Haskell a few months back was to write a c compiler, and what resulted was a fairly naive approach to code generation, which I'll walk through here. Please do not take this as an example of good design for a code generator, but rather view it as a quick and dirty (and ultimately naive) way to get something that works fairly quickly with decent performance.
I began by defining an intermediate representation LIR (Lower Intermediate Representation) which closely corresponded to my instruction set (x86_64 in my case):
data LIRInst = LIRRegAssignInst LIRReg LIRExpr
| LIRRegOffAssignInst LIRReg LIRReg LIRSize LIROperand
| LIRStoreInst LIRMemAddr LIROperand
| LIRLoadInst LIRReg LIRMemAddr
| LIREnterInst LIRInt
| LIRJumpLabelInst LIRLabel
| LIRIfInst LIRRelExpr LIRLabel LIRLabel -- false, then true
| LIRCallInst LIRLabel LIRLabel -- method label, return label
| LIRCalloutInst String
| LIRRetInst [LIRLabel] String -- list of successors, and the name of the method returning from
| LIRLabelInst LIRLabel
deriving (Show, Eq, Typeable)
Next up came a monad that would handle interleaving state throughout the translation (I was blissfully unaware of our friend-the State Monad-at the time):
newtype LIRTranslator a = LIRTranslator
{ runLIR :: Namespace -> (a, Namespace) }
instance Monad LIRTranslator where
return a = LIRTranslator (\s -> (a, s))
m >>= f = LIRTranslator (\s ->
let (a, s') = runLIR m s
in runLIR (f a) s')
along with the state that would be 'threaded' through the various translation phases:
data Namespace = Namespace
{ temp :: Int -- id's for new temporaries
, labels :: Int -- id's for new labels
, scope :: [(LIRLabel, LIRLabel)] -- current program scope
, encMethod :: String -- current enclosing method
, blockindex :: [Int] -- index into the SymbolTree
, successorMap :: Map.Map String [LIRLabel]
, ivarStack :: [(LIRReg, [CFGInst])] -- stack of ivars (see motioned code)
}
For convenience, I also specified a series of translator monadic functions, for example:
-- |Increment our translator's label counter
incLabel :: LIRTranslator Int
incLabel = LIRTranslator (\ns#(Namespace{ labels = l }) -> (l, ns{ labels = (l+1) }))
I then proceeded to recursively pattern-match my AST, fragment-by-fragment, resulting in many functions of the form:
translateBlock :: SymbolTree -> ASTBlock -> LIRTranslator [LIRInst]
translateBlock st (DecafBlock _ [] _) = withBlock (return [])
translateBlock st block =
withBlock (do b <- getBlock
let st' = select b st
declarations <- mapM (translateVarDeclaration st') (blockVars block)
statements <- mapM (translateStm st') (blockStms block)
return (concat declarations ++ concat statements))
(for translating a block of the target language's code) or
-- | Given a SymbolTree, Translate a single DecafMethodStm into [LIRInst]
translateStm st (DecafMethodStm mc _) =
do (instructions, operand) <- translateMethodCall st mc
final <- motionCode instructions
return final
(for translating a method call) or
translateMethodPrologue :: SymbolTree -> DecafMethod -> LIRTranslator [LIRInst]
translateMethodPrologue st (DecafMethod _ ident args _ _) =
do let numRegVars = min (length args) 6
regvars = map genRegVar (zip [LRDI, LRSI, LRDX, LRCX, LR8, LR9] args)
stackvars <- mapM genStackVar (zip [1..] (drop numRegVars args))
return (regvars ++ stackvars)
where
genRegVar (reg, arg) =
LIRRegAssignInst (symVar arg st) (LIROperExpr $ LIRRegOperand reg)
genStackVar (index, arg) =
do let mem = LIRMemAddr LRBP Nothing ((index + 1) * 8) qword -- ^ [rbp] = old rbp; [rbp + 8] = ret address; [rbp + 16] = first stack param
return $ LIRLoadInst (symVar arg st) mem
for an example of actually generating some LIR code. Hopefully these three examples will give you a good starting point; ultimately, you'll want to go slowly, focusing on one fragment (or intermediate type) within your AST at a time.

If you haven't done this before, you can do it in small passes:
1) for every statement produce some byte code (with out properly addressed memory locations)
2) after that is done, if you have looping, gotos, etc, put in the real addresses (you know them
now that you have it all layed out)
3) replace the memory fetches/stores with the correct locations
4) dump it out to a JAR file
Note that this is very simplified and doesn't try to do any performance optimisation. It will give you a functional program which will execute. This also assumes you know the codes for the JVM (which is where I am presuming you are going to execute it.)
To start, just have a subset of the language which does sequential arithmetic statements. This will allow you to figure out how to map variable memory locations to statements via the parse tree. Next add some looping to get jumps to work. Similarly add conditionals. Finally, you can add the final parts of your language.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string