I'm trying to write a renamer for a compiler that I'm writing in Haskell.
The renamer scans an AST looking for symbol DEFs, which it enters into a symbol table, and symbol USEs, which it resolves by looking in the symbol table.
In this language, uses can come before or after defs, so it would seem that a 2 pass strategy is required; one pass to find all the defs and build the symbol table, and a second to resolve all the uses.
However, since Haskell is lazy (like me), I figure I can tie-the-knot and pass the renamer the final symbol table before it is actually built. This is fine as long as I promise to actually build it. In an imperative programming language, this would be like sending a message back in time. This does work in Haskell, but care must be taken to not introduce a temporal paradox.
Here's a terse example:
module Main where
import Control.Monad.Error
import Control.Monad.RWS
import Data.Maybe ( catMaybes )
import qualified Data.Map as Map
import Data.Map ( Map )
type Symtab = Map String Int
type RenameM = ErrorT String (RWS Symtab String Symtab)
data Cmd = Def String Int
| Use String
renameM :: [Cmd] -> RenameM [(String, Int)]
renameM = liftM catMaybes . mapM rename1M
rename1M :: Cmd -> RenameM (Maybe (String, Int))
rename1M (Def name value) = do
modify $ \symtab -> Map.insert name value symtab
return Nothing
rename1M (Use name) = return . liftM ((,) name) . Map.lookup name =<< ask
--rename1M (Use name) =
-- maybe (return Nothing) (return . Just . (,) name) . Map.lookup name =<< ask
--rename1M (Use name) =
-- maybe (throwError $ "Cannot locate " ++ name) (return . Just . (,) name) . Map.lookup name =<< ask
rename :: [Cmd] -> IO ()
rename cmds = do
let (result, symtab, log) = runRWS (runErrorT $ renameM cmds) symtab Map.empty
print result
main :: IO ()
main = do
rename [ Use "foo"
, Def "bar" 2
, Use "bar"
, Def "foo" 1
]
This is the line where the knot is tied:
let (result, symtab, log) = runRWS (runErrorT $ renameM cmds) symtab Map.empty
The running symbol table is stored in the MonadState of the RWS, and the final symbol table is stored in the MonadReader.
In the above example, I have 3 versions of rename1M for Uses (2 are commented out). In this first form, it works fine.
If you comment out the first rename1M Use, and uncomment the second, the program does not terminate. However, it is, in spirit, no different than the first form. The difference is that it has two returns instead of one, so the Maybe returned from Map.lookup must be evaluated to see which path to take.
The third form is the one that I really want. I want to throw an error if I can't find a symbol. But this version also does not terminate. Here, the temporal paradox is obvious; the decision about whether the the symbol will be in the table can affect whether it will be in the table...
So, my question is, is there an elegant way to do what the third version does (throw an error) without running into the paradox? Send the errors on the MonadWriter without allowing the lookup to change the path? Two passes?
Do you really have to interrupt execution when an error occurs? An alternative approach would be to log errors. After tying the knot, you can check whether the list of errors is empty. I've taken this approach in the past.
-- I've wrapped a writer in a writer transformer. You'll probably want to implement it differently to avoid ambiguity
-- related to writer methods.
type RenameM = WriterT [RenameError] (RWS Symtab String Symtab)
rename1M (Use name) = do
symtab_entry <- asks (Map.lookup name)
-- Write a list of zero or more errors. Evaluation of the list is not forced until all processing is done.
tell $ if isJust symtab_entry then [] else missingSymbol name
return $ Just (name, fromMaybe (error "lookup failed") symtab_entry)
rename cmds = do
let ((result, errors), symtab, log) = runRWS (runWriterT $ renameM cmds) symtab Map.empty
-- After tying the knot, check for errors
if null errors then print result else print errors
This does not produce laziness-related nontermination problems because the contents of the symbol table are not affected by whether or not a lookup succeeded.
I don't have a well thought out answer, but one quick thought. Your single pass over the AST takes all the Def and produces a (Map Symbol _), and I wonder if the same AST pass can take all the Use and produce a (Set Symbol) as well as the lazy lookup.
Afterwards you can quite safely compare the Symbols in the keys of the Map with the Symbols in the Set. If the Set has anything not in the Map then you can report all of those Symbols are errors. If any Def'd Symbols are not in in the Set then you can warn about unused Symbols.
Related
Consider a simple Haskell Brainf*ck interpreter. Just look at the interpret function.
import Prelude hiding (Either(..))
import Control.Monad
import Data.Char (ord, chr)
-- function in question
interpret :: String -> IO ()
interpret strprog = let (prog, []) = parse strprog
in execBF prog
interpretFile :: FilePath -> IO ()
interpretFile fp = readFile fp >>= interpret
type BF = [BFInstr]
data BFInstr = Left | Right | Inc | Dec | Input | Output | Loop BF
type Tape = ([Integer], [Integer])
emptyTape = (repeat 0, repeat 0)
execBFTape :: Tape -> BF -> IO Tape
execBFTape = foldM doBF
execBF :: BF -> IO ()
execBF prog = do
execBFTape emptyTape prog
return ()
doBF :: Tape -> BFInstr -> IO Tape
doBF ((x:lefts), rights) Left = return (lefts, x:rights)
doBF (lefts, (x:rights)) Right = return (x:lefts, rights)
doBF (left, (x:rights)) Inc = return (left, (x+1):rights)
doBF (left, (x:rights)) Dec = return (left, (x-1):rights)
doBF (left, (_:rights)) Input = getChar >>= \c -> return (left, fromIntegral (ord c):rights)
doBF t#(_, (x: _)) Output = putChar (chr (fromIntegral x)) >> return t
doBF t#(left, (x: _)) (Loop bf) = if x == 0
then return t
else do t' <- execBFTape t bf
doBF t' (Loop bf)
simpleCommands = [('<', Left),
('>', Right),
(',', Input),
('.', Output),
('+', Inc),
('-', Dec)]
parse :: String -> (BF, String)
parse [] = ([], [])
parse (char:prog) = case lookup char simpleCommands of
Just command -> let (rest, prog') = parse prog
in (command : rest, prog')
Nothing ->
case char of
']' -> ([], prog)
'[' -> let (loop, prog') = parse prog
(rest, prog'') = parse prog'
in (Loop loop:rest, prog'')
_ -> parse prog
So I have a function applied like interpret "[->+<]". This gives me an IO () monadic action which executes the given program. It has the right type to be a main of some program.
Let's say I would like to have this action compiled to an executable, that is, I would like to generate an executable file with the result of interpret ... to be the main function. Of course, this executable would have to contain the GHC runtime system (for infinite lists, integer arithmetic etc.).
Questions:
It is my opinion that it is not possible at all to just take the monadic action and save it to be a new file. Is this true?
How could one go about reaching a comparable solution? Do the GHC Api and hint help?
EDIT
Sorry, I oversimplified in the original question. Of course, I can just write a file like this:
main = interpret "..."
But this is not what we usually do when we try to compile something, so consider interpretFile :: FilePath -> IO () instead. Let the BF program be saved in a file (helloworld.bf).
How would I go about creating an executable which executes the contents of helloworld.bf without actually needing the file?
$ ./MyBfCompiler helloworld.bf -o helloworld
The answer is basically no.
There are many ways to construct IO values:
Built in functions like putStrLn
Monad operations like return or >>=
Once you have an IO value there are three ways to break it down:
Set main equal to the value
unsafePerformIO
As the return value of an exported C function
All of these break down into converting an IO a into an a. There is no other way to inspect it to see what it does.
Similarly the only thing you can do with functions is put them in variables or call them (or convert them to C function pointers).
There is no sane way to otherwise inspect a function.
One thing you could do which isn’t compiling but is linking is to have your interpreter main function run on some external c string, build that into a static object, and then your “compiler” could make a new object with this C string of the program in it and link that to what you already have.
There is this theory of partial evaluation that says that if you do partial evaluation of a partial evaluator applied to an interpreter applied to some input then what you get is a compiler but ghc is not a sufficiently advanced partial evaluator.
I’m not sure whether you’re asking how you write a compiler that can take as its input a file such as helloworld.bf, or how you compile a Haskell program that runs helloworld.bf.
In the former case, you would want something a little more fleshed out than this:
import System.Environment (getArgs)
main :: IO ()
main = do
(_:fileName:_) <- getArgs
source <- readFile fileName
interpret source
interpret :: String -> IO ()
interpret = undefined -- You can fill in this piddly little detail yourself.
If you want the latter, there are a few different options. First, you can store the contents of your *.bf file in a string constant (or bettter yet, a Text or strict ByteString), and pass that to your interpreter function. I’d be surprised if GHC is optimistic enough to fully inline and expand that call at compile time, but in principle a Haskell compiler could.
The second is to turn Brainfuck into a domain-specific language with operators you define, so that you can actually write something like
interpret [^<,^+,^>,^.]
If you define (^<) and the other operators, the Brainfuck commands will compile to bytecode representing the Brainfuck program.
In this case, there isn’t an obvious benefit over the first approach, but with a more structured language, you can do an optimization pass, compile the source to stack-based bytecode more suitable for an interpreter to execute, or generate a more complex AST.
You might also express this idea as
interpret
(^< ^+ ^> ^.)
input
Here, if the Brainfuck commands are higher-order functions with right-to-left precedence, and interpret bf input = (bf begin) input, the Brainfuck code would simply compile to a function that the interpreter calls. This has the best chance of being turned into fast native code.
Previous Answer
In certain cases, a compiler can inline a function call (there are pragmas in GHC to tell it to do this). The compiler is also more likely to do what you want if you name the closure, such as:
main = interpret foo
In GHC, you can give the compiler a hint by adding
{-# INLINE main #-}
or even
{-# INLINE interpret #-}
You can check what code GHC generated by compiling the module with -S and looking through the source.
I beg for your help, speeding up the following program:
main = do
jobsToProcess <- fmap read getLine
forM_ [1..jobsToProcess] $ \_ -> do
[r, k] <- fmap (map read . words) getLine :: IO [Int]
putStrLn $ doSomeReallyLongWorkingJob r k
There could(!) be a lot of identical jobs to do, but it's not up to me modifying the inputs, so I tried to use Data.HashMap for backing up already processed jobs. I already optimized the algorithms in the doSomeReallyLongWorkingJob function, but now it seems, it's quite as fast as C.
But unfortunately it seems, I'm not able to implement a simple cache without producing a lot of errors. I need a simple cache of Type HashMap (Int, Int) Int, but everytime I have too much or too few brackets. And IF I manage to define the cache, I'm stuck in putting data into or retrieving data from the cache cause of lots of errors.
I already Googled for some hours but it seems I'm stuck. BTW: The result of the longrunner is an Int as well.
It's pretty simple to make a stateful action that caches operations. First some boilerplate:
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.State
import Data.Map (Map)
import qualified Data.Map as M
import Debug.Trace
I'll use Data.Map, but of course you can substitute in a hash map or any similar data structure without much trouble. My long-running computation will just add up its arguments. I'll use trace to show when this computation is executed; we'll hope not to see the output of the trace when we enter a duplicate input.
reallyLongRunningComputation :: [Int] -> Int
reallyLongRunningComputation args = traceShow args $ sum args
Now the caching operation will just look up whether we've seen a given input before. If we have, we'll return the precomputed answer; otherwise we'll compute the answer now and store it.
cache :: (MonadState (Map a b) m, Ord a) => (a -> b) -> a -> m b
cache f x = do
mCached <- gets (M.lookup x)
case mCached of
-- depending on your goals, you may wish to force `result` here
Nothing -> modify (M.insert x result) >> return result
Just cached -> return cached
where
result = f x
The main function now just consists of calling cache reallyLongRunningComputation on appropriate inputs.
main = do
iterations <- readLn
flip evalStateT M.empty . replicateM_ iterations
$ liftIO getLine
>>= liftIO . mapM readIO . words
>>= cache reallyLongRunningComputation
>>= liftIO . print
Let's try it in ghci!
> main
5
1 2 3
[1,2,3]
6
4 5
[4,5]
9
1 2
[1,2]
3
1 2
3
1 2 3
6
As you can see by the bracketed outputs, reallyLongRunningComputation was called the first time we entered 1 2 3 and the first time we entered 1 2, but not the second time we entered these inputs.
I hope i'm not too far off base, but first you need a way to carry around the past jobs with you. Easiest would be to use a foldM instead of a forM.
import Control.Monad
import Data.Maybe
main = do
jobsToProcess <- fmap read getLine
foldM doJobAcc acc0 [1..jobsToProcess]
where
acc0 = --initial value of some type of accumulator, i.e. hash map
doJobAcc acc _ = do
[r, k] <- fmap (map read . words) getLine :: IO [Int]
case getFromHash acc (r,k) of
Nothing -> do
i <- doSomeReallyLongWorkingJob r k
return $ insertNew acc (r,k) i
Just i -> do
return acc
Note, I don't actually use the interface for putting and getting the hash table key. It doesn't actually have to be a hash table, Data.Map from containers could work. Or even a list if its going to be a small one.
Another way to carry around the hash table would be to use a State transformer monad.
I am just adding this answer since I feel like the other answers are diverging a bit from the original question, namely using hashtable constructs in Main function (inside IO monad).
Here is a minimal hashtable example using hashtables module. To install the module with cabal, simply use
cabal install hashtables
In this example, we simply put some values in a hashtable and use lookup to print a value retrieved from the table.
import qualified Data.HashTable.IO as H
main :: IO ()
main = do
t <- H.new :: IO (H.CuckooHashTable Int String)
H.insert t 22 "Hello world"
H.insert t 5 "No problem"
msg <- H.lookup t 5
print msg
Notice that we need to use explicit type annotation to specify which implementation of the hashtable we wish to use.
I'm trying to learn the basics of Haskell while developing a filter for Pandoc to recursively include additional markdown files.
Based on the scripting guide I was able to create a somewhat working filter. This looks for CodeBlocks with the include class and tries to include the ASTs of the referenced files.
```include
section-1.md
section-2.md
#pleasedontincludeme.md
```
The whole filter and the input sources could be found in the following repository: steindani/pandoc-include (or see below)
One could run pandoc with the filter and see the output in markdown format using the following command: pandoc -t json input.md | runhaskell IncludeFilter.hs | pandoc --from json --to markdown
I've noticed that the map function (at line 38) — although gets the list of files to include — only calls the function for the first element. And this is not the only strange behavior. The included file could also have an include block that is processed and the referenced file is included; but it won't go deeper, the include blocks of the last file are ignored.
Why does not the map function iterate over the whole list? Why does it stop after 2 levels of hierarchy?
Please note that I'm just starting to learn Haskell, I'm sure I made mistakes, but I'm happy to learn.
Thank you
Full source code:
module Text.Pandoc.Include where
import Control.Monad
import Data.List.Split
import Text.Pandoc.JSON
import Text.Pandoc
import Text.Pandoc.Error
stripPandoc :: Either PandocError Pandoc -> [Block]
stripPandoc p =
case p of
Left _ -> [Null]
Right (Pandoc _ blocks) -> blocks
ioReadMarkdown :: String -> IO(Either PandocError Pandoc)
ioReadMarkdown content = return (readMarkdown def content)
getContent :: String -> IO [Block]
getContent file = do
c <- readFile file
p <- ioReadMarkdown c
return (stripPandoc p)
doInclude :: Block -> IO [Block]
doInclude cb#(CodeBlock (_, classes, _) list) =
if "include" `elem` classes
then do
files <- return $ wordsBy (=='\n') list
contents <- return $ map getContent files
result <- return $ msum contents
result
else
return [cb]
doInclude x = return [x]
main :: IO ()
main = toJSONFilter doInclude
I can spot the following error in your doInclude function:
doInclude :: Block -> IO [Block]
doInclude cb#(CodeBlock (_, classes, _) list) =
if "include" `elem` classes
then do
let files = wordsBy (=='\n') list
let contents = map getContent files
let result = msum contents -- HERE
result
else
return [cb]
doInclude x = return [x]
Since the type of the result of this whole function is IO [Block], we can work backward:
result has type IO [Block]
contents has type [IO [Block]]
msum is being used with type [IO [Block]] -> IO [Block]
And that third part is the problem—somehow in your program, there is a non-standard MonadPlus instance being loaded for IO, and I bet that what it does on msum contents is this:
Execute the first action
If that succeeds, produce the same result as that and discard the rest of the list. (This is the cause of the behavior you observe.)
If it fails with an exception, try the rest of the list.
This isn't a standard MonadPlus instance so it's coming from one of the libraries that you're importing. I don't know which.
A general recommendation here would be:
Split your program into smaller functions
Write type signatures for those functions
Because the problem here seems to be that msum is being used with a different type than the one you expect. Normally this would produce a type error, but here you got unlucky and it interacted with a strange type class instance in some library.
From the comments, your intent with msum contents was to create an IO action that executes all of the subactions in sequence, and collects their result as a list. Well, the MonadPlus class isn't normally defined for IO, and when it is it does something else. So the correct function to use here is sequence:
-- Simplified version, the real one is more general:
sequence :: Monad m => [m a] -> m [a]
sequence [] = return []
sequence (ma:mas) = do
a <- ma
as <- mas
return (a:as)
That gets you from [IO [Block]] to IO [[Block]]. To eliminate the double nested lists then you just use fmap to apply concat inside IO.
I've seen some approaches about parsing program arguments. They seem to be too complicated. I need a simple solution. But I also want to be able (preferably, not necessarily) to refer to the arguments by name. So if a command like looks like this:
./MyApp --arg1Name arg1Value --arg2Name arg2Value
then I'd like to treat them as args["arg1Name"] and args["arg2Name"] to get their values. I know that's not a valid Haskell code, though. What I have now is not much:
main = do
[args] <- getArgs
I repeat, I'd like a simple solution, preferably without involving any third-party haskell libraries.
optparse-applicative is great for argument parsing, and very easy to use! Writing your own argument parser will be much more difficult to get right, change, extend, or otherwise manage than if you take 10 minutes to write a parser with optparse-applicative.
Start by importing the Options.Applicative module.
import Options.Applicative
Next create a data type for your command-line configuration.
data Configuration = Configuration
{ foo :: String
, bar :: Int
}
Now for the workhorse, we create a parser by using the combinators exported from optparse-applicative. Read the documentation on Options.Applicative.Builder for the full experience.
configuration :: Parser Configuration
configuration = Configuration
<$> strOption
( long "foo"
<> metavar "ARG1"
)
<*> option
( long "bar"
<> metavar "ARG2"
)
Now we can execute our Parser Configuration in an IO action to get at our command-line data.
main :: IO ()
main = do
config <- execParser (info configuration fullDesc)
putStrLn (show (bar config) ++ foo config)
And we're done! You can easily extend this parser to support a --help argument to print out usage documentation (make a new parser with helper <*> configuration and pass it to info), you can add default values for certain arguments (include a <> value "default" clause in the arguments to strOption or option), you can support flags, or sub-parsers or generate tab-completion data.
Libraries are a force multiplier! The investment you make in learning the basics of a good library will pay dividends in what you're able to accomplish, and tasks will often be easier (and quicker!) with the proper tool than with a "fast" solution thrown together out of duct tape.
How about just parsing them in pairs and adding them to a map:
simpleArgsMap :: [String] -> Map String String
simpleArgsMap [] = Map.empty
simpleArgsMap (('-':'-':name):value:rest)
= Map.insert name value (simpleArgsMap rest)
simpleArgsMap xs = error $ "Couldn't parse arguments: " ++ show xs
A simple wrapper program to show this working:
module Args where
import System.Environment ( getArgs )
import Control.Applicative ( (<$>) )
import qualified Data.Map as Map
import Data.Map ( Map )
main = do
argsMap <- simpleArgsMap <$> getArgs
print $ Map.lookup "foo" argsMap
I just start learning Template Haskell, and stuck on simple problem with splicing.
In one module I've implemented function tupleN which replies N-th element of the tuple:
tupleN :: Lift a => a -> Int -> Q Exp
tupleN a n = do
(TupE as) <- lift a
return $ as !! n
In the Main module I have:
main :: IO ()
main = do
let tup = (1::Int,'a',"hello")
putStrLn $ show $(tupleN $tup 1)
This seems to be working, but it wouldn't. Compiler prints error:
GHC stage restriction: `tup'
is used in a top-level splice or annotation,
and must be imported, not defined locally
In the expression: tup
In the first argument of `tupleN', namely `$tup'
In the expression: tupleN ($tup) 1
If I put tuple description right into spliced expression, code become working:
main :: IO ()
main = do
putStrLn $ show $(tupleN (1::Int,'a',"hello") 1)
What I missing with the first variant?
You've tried to use tup as a splice, but tup is just an ordinary value. You don't want to prefix it with a $.
Moreover, as the compile error states, since Template Haskell runs during the compilation process, GHC really needs to know what it's doing before it has finished compiling the current module. That means your splice expression can't depend on tup, because that's still being compiled. Inside splices, you can only use literals, imported values, and the special 'name and ''TypeName forms (which you can think of as a sort of literal, I suppose). You can get at some of the information from this compilation by using e.g. reify, but even that can only give you data that's available at compile time – if you want a function you can pass user input, or data constructed from user input, to, that's just impossible.
In short, you can't do exactly what you want to do using Template Haskell. You could, however, define a splice that expands to a function to get the ith element of a tuple of size sz:
import Control.Monad (unless)
import Language.Haskell.TH
tupleN :: Int -> Int -> Q Exp
tupleN sz i = do
unless (i < sz) . reportError $ "tupleN: index " ++ show i
++ " out of bounds for " ++ show sz ++ "-tuple"
lamE
[tupP (replicate i wildP
++ [varP (mkName "x")]
++ replicate (sz - i - 1) wildP)]
(varE (mkName "x"))