Pretty print llvm-general-pure ASTs as llvm-ir? - haskell

I used llvm-general-pure to build an abstract syntax trees for a program in the LLVM language.
Using the provided pretty printer, I get output that looks like
A.Module {
A.moduleName = "main",
A.moduleDataLayout = Nothing,
A.moduleTargetTriple = Nothing,
A.moduleDefinitions = [
...
A.GlobalDefinition A.G.Function {
A.G.linkage = A.L.External,
A.G.visibility = A.V.Default,
A.G.callingConvention = A.CC.C,
A.G.returnAttributes = [],
A.G.returnType = A.IntegerType {A.typeBits = 32},
A.G.name = A.Name "Main",
A.G.parameters = ([], False),
A.G.functionAttributes = [],
A.G.section = Nothing,
A.G.alignment = 0,
A.G.garbageCollectorName = Nothing,
A.G.basicBlocks = [
A.G.BasicBlock (A.Name "mainBlock") [
A.Name "n57" A.:= A.Alloca {
A.allocatedType = A.IntegerType {A.typeBits = 64},
A.numElements = Nothing,
A.alignment = 0,
A.metadata = []
},
...
I want output that looks like
define i32 #main() {
mainBlock:
%n57 = alloca i64
...
}
...
It looks suspiciously like there's an automatically generated parser for the LLVM language in the llvm-general-quote package, but no corresponding pretty printer.
Stephen Diehl's excellent article hints at something called moduleString.

llvm-general-pure doesn't have a pure pretty printer, we have to go through llvm-general to do this. It can print out the IR by going through withModuleFromAST on the Haskell AST to manifest the Module representation (i.e. the C++ Module) of the IR and then calling moduleLLVMAssembly to invoke the pretty printer.
moduleLLVMAssembly :: Mod.Module -> IO String
withModuleFromAST :: Context -> AST.Module -> (Mod.Module -> IO a) -> ErrorT String IO a
This isn't pure Haskell though, it's all going through the FFI to call LLVM's internal functions.
import LLVM.General.Module as Mod
import qualified LLVM.General.AST as AST
ppModule :: AST.Module -> IO ()
ppModule ast = withContext $ \ctx ->
runExceptT $ withModuleFromAST ctx ast $ \m -> do
llstr <- moduleLLVMAssembly m
putStrLn llstr
There's no reason we couldn't have a pure pretty printer though and indeed I started on a project to do just this called llvm-pp, but it's just a large amount of mind-numbingly boring work to write a pretty printer for the whole LLVM specification.

Related

Errorbundles after parsing with megaparsec

I currently have a working parser in megaparsec, where I build an AST for my program. I now want to do some weeding operations on my AST, while being able to use the same kind of pretty errors as the parser. While this stage is after parsing, I'm wondering if there are general practices for megaparsec in doing so. Is there a way for me to extract every line and comment (used in the bundle) and add it to each item in my AST? Is there any other way that people tackle this problem?
Apologies in advance if this sounds open ended, but I'm mainly wondering is there are some better ideas than getting the line numbers and creating bundles myself. I'm still new to haskell so I haven't been able to navigate properly through all the source code.
This was answered by the megaparsec developer here.
To summarize, parsers have a getOffset function that returns the current char index. You can use that along with an initial PosState to create an error bundle which you can later pretty print.
I have a sample project within the github thread, and pasted again here:
module TestParser where
import Data.List.NonEmpty as NonEmpty
import qualified Data.Maybe as Maybe
import qualified Data.Set as Set
import Data.Void
import Parser
import Text.Megaparsec
data Sample
= Test Int
String
| TestBlock [Sample]
| TestBlank
deriving (Show, Eq)
sampleParser :: Parser Sample
sampleParser = do
l <- many testParser
return $ f l
where
f [] = TestBlank
f [s] = s
f p = TestBlock p
testParser :: Parser Sample
testParser = do
offset <- getOffset
test <- symbol "test"
return $ Test offset test
fullTestParser :: Parser Sample
fullTestParser = baseParser testParser
testParse :: String -> Maybe (ParseErrorBundle String Void)
testParse input =
case parse (baseParser sampleParser) "" input of
Left e -> Just e
Right x -> do
(offset, msg) <- testVerify x
let initialState =
PosState
{ pstateInput = input
, pstateOffset = 0
, pstateSourcePos = initialPos ""
, pstateTabWidth = defaultTabWidth
, pstateLinePrefix = ""
}
let errorBundle =
ParseErrorBundle
{ bundleErrors = NonEmpty.fromList [TrivialError offset Nothing Set.empty]
-- ^ A collection of 'ParseError's that is sorted by parse error offsets
, bundlePosState = initialState
-- ^ State that is used for line\/column calculation
}
return errorBundle
-- Sample verify; throw an error on the second test key
testVerify :: Sample -> Maybe (Int, String)
testVerify tree =
case tree of
TestBlock [_, Test a _, _] -> Just (a, "Bad")
_ -> Nothing
testMain :: IO ()
testMain = do
testExample "test test test"
putStrLn "Done"
testExample :: String -> IO ()
testExample input =
case testParse input of
Just error -> putStrLn (errorBundlePretty error)
Nothing -> putStrLn "pass"
Some parts are from other files, but the important parts are in the code.

Haskell: Can a function be compiled?

Consider a simple Haskell Brainf*ck interpreter. Just look at the interpret function.
import Prelude hiding (Either(..))
import Control.Monad
import Data.Char (ord, chr)
-- function in question
interpret :: String -> IO ()
interpret strprog = let (prog, []) = parse strprog
in execBF prog
interpretFile :: FilePath -> IO ()
interpretFile fp = readFile fp >>= interpret
type BF = [BFInstr]
data BFInstr = Left | Right | Inc | Dec | Input | Output | Loop BF
type Tape = ([Integer], [Integer])
emptyTape = (repeat 0, repeat 0)
execBFTape :: Tape -> BF -> IO Tape
execBFTape = foldM doBF
execBF :: BF -> IO ()
execBF prog = do
execBFTape emptyTape prog
return ()
doBF :: Tape -> BFInstr -> IO Tape
doBF ((x:lefts), rights) Left = return (lefts, x:rights)
doBF (lefts, (x:rights)) Right = return (x:lefts, rights)
doBF (left, (x:rights)) Inc = return (left, (x+1):rights)
doBF (left, (x:rights)) Dec = return (left, (x-1):rights)
doBF (left, (_:rights)) Input = getChar >>= \c -> return (left, fromIntegral (ord c):rights)
doBF t#(_, (x: _)) Output = putChar (chr (fromIntegral x)) >> return t
doBF t#(left, (x: _)) (Loop bf) = if x == 0
then return t
else do t' <- execBFTape t bf
doBF t' (Loop bf)
simpleCommands = [('<', Left),
('>', Right),
(',', Input),
('.', Output),
('+', Inc),
('-', Dec)]
parse :: String -> (BF, String)
parse [] = ([], [])
parse (char:prog) = case lookup char simpleCommands of
Just command -> let (rest, prog') = parse prog
in (command : rest, prog')
Nothing ->
case char of
']' -> ([], prog)
'[' -> let (loop, prog') = parse prog
(rest, prog'') = parse prog'
in (Loop loop:rest, prog'')
_ -> parse prog
So I have a function applied like interpret "[->+<]". This gives me an IO () monadic action which executes the given program. It has the right type to be a main of some program.
Let's say I would like to have this action compiled to an executable, that is, I would like to generate an executable file with the result of interpret ... to be the main function. Of course, this executable would have to contain the GHC runtime system (for infinite lists, integer arithmetic etc.).
Questions:
It is my opinion that it is not possible at all to just take the monadic action and save it to be a new file. Is this true?
How could one go about reaching a comparable solution? Do the GHC Api and hint help?
EDIT
Sorry, I oversimplified in the original question. Of course, I can just write a file like this:
main = interpret "..."
But this is not what we usually do when we try to compile something, so consider interpretFile :: FilePath -> IO () instead. Let the BF program be saved in a file (helloworld.bf).
How would I go about creating an executable which executes the contents of helloworld.bf without actually needing the file?
$ ./MyBfCompiler helloworld.bf -o helloworld
The answer is basically no.
There are many ways to construct IO values:
Built in functions like putStrLn
Monad operations like return or >>=
Once you have an IO value there are three ways to break it down:
Set main equal to the value
unsafePerformIO
As the return value of an exported C function
All of these break down into converting an IO a into an a. There is no other way to inspect it to see what it does.
Similarly the only thing you can do with functions is put them in variables or call them (or convert them to C function pointers).
There is no sane way to otherwise inspect a function.
One thing you could do which isn’t compiling but is linking is to have your interpreter main function run on some external c string, build that into a static object, and then your “compiler” could make a new object with this C string of the program in it and link that to what you already have.
There is this theory of partial evaluation that says that if you do partial evaluation of a partial evaluator applied to an interpreter applied to some input then what you get is a compiler but ghc is not a sufficiently advanced partial evaluator.
I’m not sure whether you’re asking how you write a compiler that can take as its input a file such as helloworld.bf, or how you compile a Haskell program that runs helloworld.bf.
In the former case, you would want something a little more fleshed out than this:
import System.Environment (getArgs)
main :: IO ()
main = do
(_:fileName:_) <- getArgs
source <- readFile fileName
interpret source
interpret :: String -> IO ()
interpret = undefined -- You can fill in this piddly little detail yourself.
If you want the latter, there are a few different options. First, you can store the contents of your *.bf file in a string constant (or bettter yet, a Text or strict ByteString), and pass that to your interpreter function. I’d be surprised if GHC is optimistic enough to fully inline and expand that call at compile time, but in principle a Haskell compiler could.
The second is to turn Brainfuck into a domain-specific language with operators you define, so that you can actually write something like
interpret [^<,^+,^>,^.]
If you define (^<) and the other operators, the Brainfuck commands will compile to bytecode representing the Brainfuck program.
In this case, there isn’t an obvious benefit over the first approach, but with a more structured language, you can do an optimization pass, compile the source to stack-based bytecode more suitable for an interpreter to execute, or generate a more complex AST.
You might also express this idea as
interpret
(^< ^+ ^> ^.)
input
Here, if the Brainfuck commands are higher-order functions with right-to-left precedence, and interpret bf input = (bf begin) input, the Brainfuck code would simply compile to a function that the interpreter calls. This has the best chance of being turned into fast native code.
Previous Answer
In certain cases, a compiler can inline a function call (there are pragmas in GHC to tell it to do this). The compiler is also more likely to do what you want if you name the closure, such as:
main = interpret foo
In GHC, you can give the compiler a hint by adding
{-# INLINE main #-}
or even
{-# INLINE interpret #-}
You can check what code GHC generated by compiling the module with -S and looking through the source.

Default values in Haskell data types

When you define a class in a object-oriented language it usually sets the default values for the member variables. Is there any mechanism in Haskell to do the same thing in record types? And a follow up question: If we don't know from the very start all the values for a data constructor but we obtain them from IO interaction can we build the type using something like the builder pattern from OOP?
Thanks in advance
A common idiom is to define a default value.
data A = A { foo :: Int , bar :: String }
defaultA :: A
defaultA = A{foo = 0, bar = ""}
This can be then (purely) "updated" later on with real values.
doSomething :: Bool -> A
doSomething True = defaultA{foo = 32}
doSomething False = defaultA{bar = "hello!"}
Pseudocode example:
data Options = O{ textColor :: Bool, textSize :: Int, ... }
defaultOptions :: Options
defaultOptions = O{...}
doStuff :: Options -> IO ()
doStuff opt = ...
main :: IO ()
main = do
...
-- B&W, but use default text size
doStuff defaultOptions{ color = False }
If there are no sensible default values, you can wrap the field values in Maybe.
If you feel adventurous, you can even use a more advanced approach to statically separate "intermediate" options values, which can lack a few fields, from "finalized" ones, which must have all the fields. (I'd not recommend this to Haskell beginners, though.)
Is there any mechanism in Haskell to do the same thing in record types?
What you can do is hide the constructor, and provide a function as constructor instead.
Say for instance we have a list we want to update, together with a revision number, then we can define it as:
data RevisionList a = RevisionList { theList :: [a],
revision :: Int }
deriving Show
Now we can define a function that initializes the BuildList with an initial list:
revisionList :: [a] -> RevisionList a
revisionList xs = RevisionList { theList = xs, revision=0 }
and by hiding the constructor in the module export, we thus hide the possibility to initialize it with another revision than revision 0. So the module could look like:
module Foo(RevisionList(), revisionList)
data RevisionList a = RevisionList { theList :: [a],
revision :: Int }
revisionList :: [a] -> RevisionList a
revisionList xs = RevisionList { theList = xs, revision=0 }
something like the builder pattern from OOP?
We can for instance use a State monad for that. For instance:
module Foo(RevisionList(), revisionList,
increvision, RevisionListBuilder, prefixList)
import Control.Monad.State.Lazy
type RevisionListBuilder a = State (RevisionList a)
increvision :: RevisionListBuilder a ()
increvision = do
rl <- get
put (rl { revision = 1 + revision rl})
prefixList :: a -> RevisionListBuilder a ()
prefixList x = do
rl <- get
put (rl { theList = x : theList rl })
increvision
So we get the RevisionList thus far, perform updates, put the new result back, and increment the revision number.
So now another module can import our Foo, and use the builder like:
some_building :: RevisionListBuilder Int ()
some_building = do
prefixList 4
prefixList 1
and now we can "make" a RevisionList at revision 2 with as final list [1,4,2,5] with:
import Control.Monad.State.Lazy(execState)
some_rev_list :: RevisionList Int
some_rev_list = execState some_building (revisionList [2,5])
So it would look approximately like:
Foo.hs:
module Foo(RevisionList(), revisionList,
increvision, RevisionListBuilder, prefixList)
data RevisionList a = RevisionList { theList :: [a],
revision :: Int }
deriving Show
type RevisionListBuilder a = State (RevisionList a)
revisionList :: [a] -> RevisionList a
revisionList xs = RevisionList { theList = xs, revision=0 }
increvision :: RevisionListBuilder a ()
increvision = do
rl <- get
put (rl { revision = 1 + revision rl})
prefixList :: a -> RevisionListBuilder a ()
prefixList x = do
rl <- get
put (rl { theList = x : theList rl })
increvision
Bar.hs:
import Foo
import Control.Monad.State.Lazy(execState)
some_building :: RevisionListBuilder Int ()
some_building = do
prefixList 4
prefixList 1
some_rev_list :: RevisionList Int
some_rev_list = execState some_building (revisionList [2,5])
So now we have constructed a some_rev_list with the "building" of some_building:
Foo Bar> some_rev_list
RevisionList {theList = [1,4,2,5], revision = 2}
There's already good answers here, so this answer is only meant as a supplement to the fine answers from chi and Willem Van Onsem.
In mainstream object-oriented languages like Java and C#, it's not that a default object is uninitialised; rather, a default object is normally initialised with default values for their types, and it just happens that for reference types, the default is a null reference.
Haskell doesn't have null references, so records can't be initialised with nulls. The most direct translation of objects would be records where every single constituent element is a Maybe. That's not particularly useful, however, but it highlights how hard it is to protect invariants in OOP.
The Builder pattern doesn't solve that problem at all. Any Builder has to start with an initial Builder object, and that object is going to have to have default values as well.
For more details, and lots of examples, I wrote an article series about this. The article series specifically focuses on the Test Data Builder pattern, but you should be able to see how it generalises to the Fluent Builder pattern in general.

Using content of a string to call function with same name

I have a main like the following:
main :: IO ()
main = do
args <- getArgs
putStrLn $ functionName args
where
functionName args = "problem" ++ (filter (/= '"') $ show (args!!0))
Instead of putting the name to stdout like I do it right now, I want to call the function.
I am aware of the fact, that I could use hint (as mentioned in Haskell: how to evaluate a String like "1+2") but I think that would be pretty overkill for just getting that simple function name.
At the current stage it does not matter if the program crashes if the function does not exist!
Without taking special measures to preserve them, the names of functions will likely be gone completely in a compiled Haskell program.
I would suggest just making a big top-level map:
import Data.Map ( Map )
import qualified Data.Map as Map
functions :: Map String (IO ())
functions = Map.fromList [("problem1", problem1), ...]
call :: String -> IO ()
call name =
case Map.lookup name of
Nothing -> fail $ name + " not found"
Just m -> m
main :: IO ()
main = do
args <- getArgs
call $ functionName args
where
functionName args = "problem" ++ (filter (/= '"') $ show (args!!0))
If you're going to do this, you have a few approaches, but the easiest by far is to just pattern match on it
This method requires that all of your functions you want to call have the same type signature:
problem1 :: Int
problem1 = 1
problem2 :: Int
problem2 = 2
runFunc :: String -> Maybe Int
runFunc "problem1" = Just problem1
runFunc "problem2" = Just problem2
runFunc _ = Nothing
main = do
args <- getArgs
putStrLn $ runFunc $ functionName args
This requires you to add a line to runFunc each time you add a new problemN, but that's pretty manageable.
You can't get a string representation of an identifier, not without fancy non-standard features, because that information isn't retained after compilation. As such, you're going to have to write down those function names as string constants somewhere.
If the function definitions are all in one file anyway, what I would suggest is to use data types and lambdas to avoid having to duplicate those function names altogether:
Data Problem = {
problemName :: String,
evalProblem :: IO () # Or whatever your problem function signatures are
}
problems = [Problem]
problems = [
Problem {
problemName = "problem1",
evalProblem = do ... # Insert code here
},
Problem
problemName = "problem2",
evalProblem = do ... # Insert code here
}
]
main :: IO ()
main = do
args <- getArgs
case find (\x -> problemName x == (args!!0)) problems of
Just x -> evalProblem x
Nothing -> # Handle error
Edit: Just to clarify, I'd say the important takeaway here is that you have an XY Problem.

How do I avoid memory problems when writing to file using the Writer monad?

I am building some moderately large DIMACS files, however with the method used below the memory usage is rather large compared to the size of the files generated, and on some of the larger files I need to generate I run in to out of memory problems.
import Control.Monad.State.Strict
import Control.Monad.Writer.Strict
import qualified Data.ByteString.Lazy.Char8 as B
import Control.Monad
import qualified Text.Show.ByteString as BS
import Data.List
main = printDIMACS "test.cnf" test
test = do
xs <- freshs 100000
forM_ (zip xs (tail xs))
(\(x,y) -> addAll [[negate x, negate y],[x,y]])
type Var = Int
type Clause = [Var]
data DIMACSS = DS{
nextFresh :: Int,
numClauses :: Int
} deriving (Show)
type DIMACSM a = StateT DIMACSS (Writer B.ByteString) a
freshs :: Int -> DIMACSM [Var]
freshs i = do
next <- gets nextFresh
let toRet = [next..next+i-1]
modify (\s -> s{nextFresh = next+i})
return toRet
fresh :: DIMACSM Int
fresh = do
i <- gets nextFresh
modify (\s -> s{nextFresh = i+1})
return i
addAll :: [Clause] -> DIMACSM ()
addAll c = do
tell
(B.concat .
intersperse (B.pack " 0\n") .
map (B.unwords . map BS.show) $ c)
tell (B.pack " 0\n")
modify (\s -> s{numClauses = numClauses s + length c})
add h = addAll [h]
printDIMACS :: FilePath -> DIMACSM a -> IO ()
printDIMACS file f = do
writeFile file ""
appendFile file (concat ["p cnf ", show i, " ", show j, "\n"])
B.appendFile file b
where
(s,b) = runWriter (execStateT f (DS 1 0))
i = nextFresh s - 1
j = numClauses s
I would like to keep the monadic building of clauses since it is very handy, but I need to overcome the memory problem. How do I optimize the above program so that it doesn't use too much memory?
If you want good memory behavior, you need to make sure that you write out the clauses as you generate them, instead of collecting them in memory and dumping them as such, either using lazyness or a more explicit approach such as conduits, enumerators, pipes or the like.
The main obstacle to that approach is that the DIMACS format expects the number of clauses and variables in the header. This prevents the naive implementation from being sufficiently lazy. There are two possibilities:
The pragmatic one is to write the clauses first to a temporary location. After that the numbers are known, so you write them to the real file and append the contents of the temporary file.
The prettier approach is possible if the generation of clauses has no side effects (besides the effects offered by your DIMACSM monad) and is sufficiently fast: Run it twice, first throwing away the clauses and just calculating the numbers, print the header line, run the generator again; now printing the clauses.
(This is from my experience with implementing SAT-Britney, where I took the second approach, because it fitted better with other requirements in that context.)
Also, in your code, addAll is not lazy enough: The list c needs to be retained even after writing (in the MonadWriter sense) the clauses. This is another space leak. I suggest you implement add as the primitive operation and then addAll = mapM_ add.
As explained in Joachim Breitner's answer the problem was that DIMACSM was not lazy enough, both because the strict versions of the monads was used and because the number of variables and clauses are needed before the ByteString can be written to the file. The solution is to use the lazy versions of the Monads and execute them twice. It turns out that it is also necessary to have WriterT be the outer monad:
import Control.Monad.State
import Control.Monad.Writer
...
type DIMACSM a = WriterT B.ByteString (State DIMACSS) a
...
printDIMACS :: FilePath -> DIMACSM a -> IO ()
printDIMACS file f = do
writeFile file ""
appendFile file (concat ["p cnf ", show i, " ", show j, "\n"])
B.appendFile file b
where
s = execState (execWriterT f) (DS 1 0)
b = evalState (execWriterT f) (DS 1 0)
i = nextFresh s - 1
j = numClauses s

Resources