Towards understanding CodeGen* in the Haskell LLVM bindings - haskell

Background: I am writing a toy Lisp interperter/compiler in Haskell for my own amusement/edification. I am trying to add the ability to compile to LLVM bytecode.
Context: I have been reading the documentation for LLVM.Core and a code example (here) attempting to understand the means of combination and means of abstraction (as described in Abelson and Sussman Structure and Interpretation
of Computer Programs.) used in the Haskell LLVM bindings. There are a lot of small pieces and I am not clear how they are intended to work together. It seems like there is a level of abstraction above the basic LLVM machine instructions that is obvious to someone with lots of experience with LLVM, but not documented for those, like me, who are just getting their feet wet.
Question: What are CodeGenModule and CodeGenFunction and how are they used to build up Functions and Modules?

The Module and Function types are just thin wrappers around pointers to the corresponding C++ objects (that is, Module* and Value*):
-- LLVM.Core.Util
newtype Module = Module {
fromModule :: FFI.ModuleRef
}
deriving (Show, Typeable)
type Function a = Value (Ptr a)
newtype Value a = Value { unValue :: FFI.ValueRef }
deriving (Show, Typeable)
-- LLVM.FFI.Core
data Module
deriving (Typeable)
type ModuleRef = Ptr Module
data Value
deriving (Typeable)
type ValueRef = Ptr Value
The CodeGenModule and CodeGenFunction types are parts of the EDSL built on top of the LLVM.FFI.* modules. They use Function, Module and the functions from LLVM.FFI.* internally and allow you to write LLVM IR in Haskell concisely using do-notation (example taken from Lennart Augustsson's blog):
mFib :: CodeGenModule (Function (Word32 -> IO Word32))
mFib = do
fib <- newFunction ExternalLinkage
defineFunction fib $ \ arg -> do
-- Create the two basic blocks.
recurse <- newBasicBlock
exit <- newBasicBlock
[...]
ret r
return fib
You can think of CodeGenModule as an AST representing a parsed LLVM assembly file (.ll). Given a CodeGenModule, you can e.g. write it to a .bc file:
-- newModule :: IO Module
mod <- newModule
-- defineModule :: Module -> CodeGenModule a -> IO a
defineModule mod $ do [...]
-- writeBitcodeToFile :: FilePath -> Module -> IO ()
writeBitcodeToFile "mymodule.bc" mod
--- Alternatively, just use this function from LLVM.Util.File:
writeCodeGenModule :: FilePath -> CodeGenModule a -> IO ()
I also recommend you to acquaint yourself with core classes of LLVM, since they also show through in the Haskell API.

CodeGenFunction maintains LLVM assembly code for one function.
CodeGenModule maintains several such functions.
In the Haskell llvm bindings package there is an example directory with working code.

Related

Modeling a domain as a GADT type and providing do-sugar for it

Assume we'd like to build a type that represents operations typical for, let's say, a lock-free algorithm:
newtype IntPtr = IntPtr { ptr :: Int } deriving (Eq, Ord, Show)
data Op r where
OpRead :: IntPtr -> Op Int
OpWrite :: IntPtr -> Int -> Op ()
OpCAS :: IntPtr -> Int -> Int -> Op Bool
Ideally, we'd like to represent some algorithms within this model using a convenient do-notation, like (assuming corresponding read = OpRead and cas = OpCAS for aesthetic reasons) the following almost literal translation of the Wikipedia example:
import Prelude hiding (read)
import Control.Monad.Loops
add :: IntPtr -> Int -> Op Int
add p a = snd <$> do
iterateUntil fst $ do
value <- read p
success <- cas p value (value + a)
pure (success, value + a)
How could we achieve that? Let's add a couple more constructors to Op to represent pure injected values and the monadic bind:
OpPure :: a -> Op a
OpBind :: Op a -> (a -> Op b) -> Op b
So let's try to write a Functor instance. OpPure and OpBind is easy, being, for instance:
instance Functor Op where
fmap f (OpPure x) = OpPure (f x)
But the constructors that specify the GADT type start smelling bad:
fmap f (OpRead ptr) = do
val <- OpRead ptr
pure $ f val
Here we assume we'll write the Monad instance later on anyway to avoid ugly nested OpBinds.
Is this the right way to handle such types, or is my design just terribly wrong, this being a sign of it?
This style of using do-notation to build a syntax tree that'll be interpreted later is modelled by the free monad. (I'm actually going to demonstrate what's known as the freer or operational monad, because it's closer to what you have so far.)
Your original Op datatype - without OpPure and OpBind - represents a set of atomic typed instructions (namely read, write and cas). In an imperative language a program is basically a list of instructions, so let's design a datatype which represents a list of Ops.
One idea might be to use an actual list, ie type Program r = [Op r]. Clearly that won't do as it constrains every instruction in the program to have the same return type, which would not make for a very useful programming language.
The key insight is that in any reasonable operational semantics of an interpreted imperative language, control flow doesn't proceed past an instruction until the interpreter has computed a return value for that instruction. That is, the nth instruction of a program depends in general on the results of instructions 0 to n-1. We can model this using continuation passing style.
data Program a where
Return :: a -> Program a
Step :: Op r -> (r -> Program a) -> Program a
A Program is a kind of list of instructions: it's either an empty program which returns a single value, or it's a single instruction followed by a list of instructions. The function inside the Step constructor means that the interpreter running the Program has to come up with an r value before it can resume interpreting the rest of the program. So sequentiality is ensured by the type.
To build your atomic programs read, write and cas, you need to put them in a singleton list. This involves putting the relevant instruction in the Step constructor, and passing a no-op continuation.
lift :: Op a -> Program a
lift i = Step i Return
read ptr = lift (OpRead ptr)
write ptr val = lift (OpWrite ptr val)
cas ptr cmp val = lift (OpCas ptr cmp val)
Program differs from your tweaked Op in that at each Step there's only ever one instruction. OpBind's left argument was potentially a whole tree of Ops. This would've allowed you to distinguish differently-associated >>=s, breaking the monad associativity law.
You can make Program a monad.
instance Monad Program where
return = Return
Return x >>= f = f x
Step i k >>= f = Step i ((>>= f) . k)
>>= basically performs list concatenation - it walks to the end of the list (by composing recursive calls to itself under the Step continuations) and grafts on a new tail. This makes sense - it corresponds to the intutitive "run this program, then run that program" semantics of >>=.
Noting that Program's Monad instance doesn't depend on Op, an obvious generalisation is to parameterise the type of instruction and make Program into a list of any old instruction set.
data Program i a where
Return :: a -> Program i a
Step :: i r -> (r -> Program i a) -> Program a
instance Monad (Program i) where
-- implementation is exactly the same
So Program i is a monad for free, no matter what i is. This version of Program is a rather general tool for modelling imperative languages.

Haskell function with type IO Int -> Int, without using unsafePerformIO

I have a homework question asking me:
Can you write a Haskell function with type IO Int -> Int (without using unsafePerformIO)? If yes, give the function; if not, explain the reason.
I have tried to write such a function:
test :: IO Int -> Int
test a = do
x <- a
return x
But this does not work. I have tried to make it work for a while, and I can't, so I gather that the answer to the question is no. But, I do not understand why it is not possible. Why doesn't it work?
The only functions of type IO Int -> Int are uninteresting, since they must ignore their argument. that is, they must be equivalent to
n :: Int
n = ...
f :: IO Int -> Int
f _ = n
(Technically, there's also f x = x `seq` n, as #Keshav points out).
This is because there's no way to escape the IO monad (unlike most other monads). This is by design. Consider
getInt :: IO Int
getInt = fmap read getLine
which is a function which reads an integer from stdin. If we could write
n :: Int
n = f getInt
this would be an integer value which can "depend" on the IO action getInt... but must be pure, i.e. must do no IO at all. How can we use an IO action if we must do no IO at all? It turns out that we can not use it in any way.
The only way to do such operation meaningfully is to allow the programmer break the "purity" contract, which is the main thing behind Haskell. GHC gives the programmer a few "unsafe" operations if the programmer is bold enough to declare "trust me, I know what I am doing". One of them is "unsafePerformIO". Another is accessing the realWorld# low-level stuff in GHC.* modules (as #Michael shows above). Even the FFI can import a C function claiming it to be pure, essentially enabling the user to write their own unsafePerformIO.
Well, there are two ways...
If you are happy with a constant function, use this:
test :: IO Int -> Int
test a = 42
Otherwise, you have to do the same thing that unsafePerformIO does (or more specifically unsafeDupablePerformIO).
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
import GHC.Base (IO(..), realWorld#)
unIO (IO m) = case m realWorld# of (# _, r #) -> r
test :: IO Int -> Int
test a = (unIO a) + 1
Have fun...
Disclaimer: this is all really hacky ;)

How to store arbitrary values in a recursive structure or how to build a extensible software architecture?

I'm working on a basic UI toolkit and am trying to figure out the overall architecture.
I am considering to use WAI's structure for extensibility. A reduced example of the core structure for my UI:
run :: Application -> IO ()
type Application = Event -> UI -> (Picture, UI)
type Middleware = Application -> Application
In WAI, arbitrary values for Middleware are saved in the vault. I think that this is a bad hack to save arbitary values, because it isn't transparent, but I can't think of a sufficient simple structure to replace this vault to give every Middleware a place to save arbitrary values.
I considered to recursively store tuples in tuples:
run :: (Application, x) -> IO ()
type Application = Event -> UI -> (Picture, UI)
type Middleware y x = (Application, x) -> (Application, (y,x))
Or to only use lazy lists to provide a level on which is no need to separate values (which provides more freedom, but also has more problems):
run :: Application -> IO ()
type Application = [Event -> UI -> (Picture, UI)]
type Middleware = Application -> Application
Actually, I would use a modified lazy list solution. Which other solutions might work?
Note that:
I prefer not to use lens at all.
I know UI -> (Picture, UI) could be defined as State UI Picture .
I'm not aware of a solution regarding monads, transformers or FRP. It would be great to see one.
Lenses provide a general way to reference data type fields so that you can extend or refactor your data set without breaking backwards compatibility. I'll use the lens-family and lens-family-th libraries to illustrate this, since they are lighter dependencies than lens.
Let's begin with a simple record with two fields:
{-# LANGUAGE Template Haskell #-}
import Lens.Family2
import Lens.Family2.TH
data Example = Example
{ _int :: Int
, _str :: String
}
makeLenses ''Example
-- This creates these lenses:
int :: Lens' Example Int
str :: Lens' Example String
Now you can write Stateful code that references fields of your data structure. You can use Lens.Family2.State.Strict for this purpose:
import Lens.Family2.State.Strict
-- Everything here also works for `StateT Example IO`
example :: State Example Bool
example = do
s <- use str -- Read the `String`
str .= s ++ "!" -- Set the `String`
int += 2 -- Modify the `Int`
zoom int $ do -- This sub-`do` block has type: `State Int Int`
m <- get
return (m + 1)
The key thing to note is that I can update my data type, and the above code will still compile. Add a new field to Example and everything will still work:
data Example = Example
{ _int :: Int
, _str :: String
, _char :: Char
}
makeLenses ''Example
int :: Lens' Example Int
str :: Lens' Example String
char :: Lens' Example Char
However, we can actually go a step further and completely refactor our Example type like this:
data Example = Example
{ _example2 :: Example
, _char :: Char
}
data Example2 = Example2
{ _int2 :: Int
, _str2 :: String
}
makeLenses ''Example
char :: Lens' Example Char
example2 :: Lens' Example Example2
makeLenses ''Example2
int2 :: Lens' Example2 Int
str2 :: Lens' Example2 String
Do we have to break our old code? No! All we have to do is add the following two lenses to support backwards compatibility:
int :: Lens' Example Int
int = example2 . int2
str :: Lens' Example Char
str = example2 . str2
Now all the old code still works without any changes, despite the intrusive refactoring of our Example type.
In fact, this works for more than just records. You can do the exact same thing for sum types, too (a.k.a. algebraic data types or enums). For example, suppose we have this type:
data Example3 = A String | B Int
makeTraversals ''Example3
-- This creates these `Traversals'`:
_A :: Traversal' Example3 String
_B :: Traversal' Example3 Int
Many of the things that we did with sum types can similarly be re-expressed in terms of Traversal's. There's a notable exception of pattern matching: it's actually possible to implement pattern matching with totality checking with Traversals, but it's currently verbose.
However, the same point holds: if you express all your sum type operations in terms of Traversal's, then you can greatly refactor your sum type and just update the appropriate Traversal's to preserve backwards compatibility.
Finally: note that the true analog of sum type constructors are Prisms (which let you build values using the constructors in addition to pattern matching). Those are not supported by the lens-family family of libraries, but they are provided by lens and you can implement them yourself using just a profunctors dependency if you want.
Also, if you're wondering what the lens analog of a newtype is, it's an Iso', and that also minimally requires a profunctors dependency.
Also, everything I've said works for reference multiple fields of recursive types (using Folds). Literally anything you can imagine wanting to reference in a data type in a backwards-compatible way is encompassed by the lens library.

What are hashes (#) used for in the library's source?

I was trying to figure out how mVars work, and I came across this bit of code:
-- |Create an 'MVar' which is initially empty.
newEmptyMVar :: IO (MVar a)
newEmptyMVar = IO $ \ s# ->
case newMVar# s# of
(# s2#, svar# #) -> (# s2#, MVar svar# #)
Besides being confusingly mutually recursive with newMVar, it's also littered with hashs (#).
Between the two, I can't figure out how it works. I know that this is basically just a pseudo-constructor for mVar, but the rest of the module (most of the library actually) contains them, and I can't find anything on them. Googling "Haskell hashs" didn't yield anything relevant.
They're (literally) magic hashes. They distinguish GHC's primitive's like addition, unboxed types, and unboxed tuples. You can enable writing them with
{-# LANGUAGE MagicHash #-}
Now you can import the stubs that let you use them with
import GHC.Exts
unboxed :: Int# -> Int# -> Int#
unboxed a# b# = a# +# b#
boxed :: Int -> Int -> Int
boxed (I# a#) (I# b#) = I# (unboxed a# b#)
This actually is kinda nifty when you think about it, by wrapping the magical and strict primitives like this, we can handle lazy Ints and Chars uniformly at the runtime system level.
Because primitives are not boxed, they're segregated at the kind level. This means that Int# doesn't have the kind * like normal types, which also means something like
kindClash :: Int# -> Int#
kindClash = id -- id expects boxed types
Won't compile.
To further elaborate on your code, newMVar includes a call to a compiler primitive in GHC to allocate a new mutable variable. It's not mutually recursive so much as a thin wrapper over a compiler call. There's also some darkness gathering at the corners of this function since we're treating IO as a perverse state monad, but let's not look to closely at that. I like my sanity too much.
I don't use primitives in everyday code, nor should you. They come up when implementing crazy optimized hotspots, or near primitive abstractions like what you're looking at.

Plug new FFI method into GHC

Is there a way to plug a Haskell function of type
myFFI :: (C a) => String -> IO a
(where C is some typeclass describing the types of variables I can import) into GHC as an FFI scheme so that I can write in my Haskell program stuff like
foreign import myFFI "foo" foo :: T1 -> T2
that gets compiled into a call to foo = unsafePerformIO $ myFFI "foo" :: T1 -> T2?
I imagine this could be done by modifying GHC, but is there a way to do it via a plugin I can write without touching the GHC codebase proper?
To answer the question in the comments (since the main question is answered with "use TH"), you can use TH as well to collect a list of all the names you've thus bound. Then, at startup, an init call can walk through that and force them.
There is no requirement that the second argument be in the IO monad in the first place.
foreign import ccall sin :: Double -> Double
is perfectly legit, but leads to undefined behavior if sin is impure.

Resources