Haskell: Can a function be compiled? - haskell

Consider a simple Haskell Brainf*ck interpreter. Just look at the interpret function.
import Prelude hiding (Either(..))
import Control.Monad
import Data.Char (ord, chr)
-- function in question
interpret :: String -> IO ()
interpret strprog = let (prog, []) = parse strprog
in execBF prog
interpretFile :: FilePath -> IO ()
interpretFile fp = readFile fp >>= interpret
type BF = [BFInstr]
data BFInstr = Left | Right | Inc | Dec | Input | Output | Loop BF
type Tape = ([Integer], [Integer])
emptyTape = (repeat 0, repeat 0)
execBFTape :: Tape -> BF -> IO Tape
execBFTape = foldM doBF
execBF :: BF -> IO ()
execBF prog = do
execBFTape emptyTape prog
return ()
doBF :: Tape -> BFInstr -> IO Tape
doBF ((x:lefts), rights) Left = return (lefts, x:rights)
doBF (lefts, (x:rights)) Right = return (x:lefts, rights)
doBF (left, (x:rights)) Inc = return (left, (x+1):rights)
doBF (left, (x:rights)) Dec = return (left, (x-1):rights)
doBF (left, (_:rights)) Input = getChar >>= \c -> return (left, fromIntegral (ord c):rights)
doBF t#(_, (x: _)) Output = putChar (chr (fromIntegral x)) >> return t
doBF t#(left, (x: _)) (Loop bf) = if x == 0
then return t
else do t' <- execBFTape t bf
doBF t' (Loop bf)
simpleCommands = [('<', Left),
('>', Right),
(',', Input),
('.', Output),
('+', Inc),
('-', Dec)]
parse :: String -> (BF, String)
parse [] = ([], [])
parse (char:prog) = case lookup char simpleCommands of
Just command -> let (rest, prog') = parse prog
in (command : rest, prog')
Nothing ->
case char of
']' -> ([], prog)
'[' -> let (loop, prog') = parse prog
(rest, prog'') = parse prog'
in (Loop loop:rest, prog'')
_ -> parse prog
So I have a function applied like interpret "[->+<]". This gives me an IO () monadic action which executes the given program. It has the right type to be a main of some program.
Let's say I would like to have this action compiled to an executable, that is, I would like to generate an executable file with the result of interpret ... to be the main function. Of course, this executable would have to contain the GHC runtime system (for infinite lists, integer arithmetic etc.).
Questions:
It is my opinion that it is not possible at all to just take the monadic action and save it to be a new file. Is this true?
How could one go about reaching a comparable solution? Do the GHC Api and hint help?
EDIT
Sorry, I oversimplified in the original question. Of course, I can just write a file like this:
main = interpret "..."
But this is not what we usually do when we try to compile something, so consider interpretFile :: FilePath -> IO () instead. Let the BF program be saved in a file (helloworld.bf).
How would I go about creating an executable which executes the contents of helloworld.bf without actually needing the file?
$ ./MyBfCompiler helloworld.bf -o helloworld

The answer is basically no.
There are many ways to construct IO values:
Built in functions like putStrLn
Monad operations like return or >>=
Once you have an IO value there are three ways to break it down:
Set main equal to the value
unsafePerformIO
As the return value of an exported C function
All of these break down into converting an IO a into an a. There is no other way to inspect it to see what it does.
Similarly the only thing you can do with functions is put them in variables or call them (or convert them to C function pointers).
There is no sane way to otherwise inspect a function.
One thing you could do which isn’t compiling but is linking is to have your interpreter main function run on some external c string, build that into a static object, and then your “compiler” could make a new object with this C string of the program in it and link that to what you already have.
There is this theory of partial evaluation that says that if you do partial evaluation of a partial evaluator applied to an interpreter applied to some input then what you get is a compiler but ghc is not a sufficiently advanced partial evaluator.

I’m not sure whether you’re asking how you write a compiler that can take as its input a file such as helloworld.bf, or how you compile a Haskell program that runs helloworld.bf.
In the former case, you would want something a little more fleshed out than this:
import System.Environment (getArgs)
main :: IO ()
main = do
(_:fileName:_) <- getArgs
source <- readFile fileName
interpret source
interpret :: String -> IO ()
interpret = undefined -- You can fill in this piddly little detail yourself.
If you want the latter, there are a few different options. First, you can store the contents of your *.bf file in a string constant (or bettter yet, a Text or strict ByteString), and pass that to your interpreter function. I’d be surprised if GHC is optimistic enough to fully inline and expand that call at compile time, but in principle a Haskell compiler could.
The second is to turn Brainfuck into a domain-specific language with operators you define, so that you can actually write something like
interpret [^<,^+,^>,^.]
If you define (^<) and the other operators, the Brainfuck commands will compile to bytecode representing the Brainfuck program.
In this case, there isn’t an obvious benefit over the first approach, but with a more structured language, you can do an optimization pass, compile the source to stack-based bytecode more suitable for an interpreter to execute, or generate a more complex AST.
You might also express this idea as
interpret
(^< ^+ ^> ^.)
input
Here, if the Brainfuck commands are higher-order functions with right-to-left precedence, and interpret bf input = (bf begin) input, the Brainfuck code would simply compile to a function that the interpreter calls. This has the best chance of being turned into fast native code.
Previous Answer
In certain cases, a compiler can inline a function call (there are pragmas in GHC to tell it to do this). The compiler is also more likely to do what you want if you name the closure, such as:
main = interpret foo
In GHC, you can give the compiler a hint by adding
{-# INLINE main #-}
or even
{-# INLINE interpret #-}
You can check what code GHC generated by compiling the module with -S and looking through the source.

Related

Haskell Input to create a String List

I would like to allow a user to build a list from a series of inputs in Haskell.
The getLine function would be called recursively until the stopping case ("Y") is input, at which point the list is returned.
I know the function needs to be in a similar format to below. I am having trouble assigning the correct type signatures - I think I need to include the IO type somewhere.
getList :: [String] -> [String]
getList list = do line <- getLine
if line == "Y"
then return list
else getList (line : list)
So there's a bunch of things that you need to understand. One of them is the IO x type. A value of this type is a computer program that, when later run, will do something and produce a value of type x. So getLine doesn't do anything by itself; it just is a certain sort of program. Same with let p = putStrLn "hello!". I can sequence p into my program multiple times and it will print hello! multiple times, because the IO () is a program, as a value which Haskell happens to be able to talk about and manipulate. If this were TypeScript I would say type IO<x> = { run: () => Promise<x> } and emphatically that type says that the side-effecting action has not been run yet.
So how do we manipulate these values when the value is a program, for example one that fetches the current system time?
The most fundamental way to chain such programs together is to take a program that produces an x (an IO x) and then a Haskell function which takes an x and constructs a program which produces a y (an x -> IO y and combines them together into a resulting program producing a y (an IO y.) This function is called >>= and pronounced "bind". In fact this way is universal, if we add a program which takes any Haskell value of type x and produces a program which does nothing and produces that value (return :: x -> IO x). This allows you to use, for example, the Prelude function fmap f = (>>= return . f) which takes an a -> b and applies it to an IO a to produce an IO b.
So It is so common to say things like getLine >>= \line -> putStrLn (upcase line ++ "!") that we invented do-notation, writing this as
do
line <- getLine
putStrLn (upcase line ++ "!")
Notice that it's the same basic deal; the last line needs to be an IO y for some y.
The last thing you need to know in Haskell is the convention which actually gets these things run. That is that, in your Haskell source code, you are supposed to create an IO () (a program whose value doesn't matter) called Main.main, and the Haskell compiler is supposed to take this program which you described, and give it to you as an executable which you can run whenever you want. As a very special case, the GHCi interpreter will notice if you produce an IO x expression at the top level and will immediately run it for you, but that is very different from how the rest of the language works. For the most part, Haskell says, describe the program and I will give it to you.
Now that you know that Haskell has no magic and the Haskell IO x type just is a static representation of a computer program as a value, rather than something which does side-effecting stuff when you "reduce" it (like it is in other languages), we can turn to your getList. Clearly getList :: IO [String] makes the most sense based on what you said: a program which allows a user to build a list from a series of inputs.
Now to build the internals, you've got the right guess: we've got to start with a getLine and either finish off the list or continue accepting inputs, prepending the line to the list:
getList = do
line <- getLine
if line == 'exit' then return []
else fmap (line:) getList
You've also identified another way to do it, which depends on taking a list of strings and producing a new list:
getList :: IO [String]
getList = fmap reverse (go []) where
go xs = do
x <- getLine
if x == "exit" then return xs
else go (x : xs)
There are probably several other ways to do it.

Is it possible to 'read' a function in Haskell?

new user, semi-noobie Haskell programmer here. I've been looking through 'Write yourself a Scheme in 48 hours' and it occurred to me that, though it would be extremely unsafe in practice, it would be interesting to see if a Haskell program could 'read' a function.
For example, read "+" :: Num a => a -> a -> a -- (that is the type of (+) )
The above example did not work, however. Any ideas? I know this is a really dumb thing to do in practice, but it would be really cool if it were possible, right?
Haskell is a static and compiled language and you can interpret a string as a function by using Language.Haskell.Interpreter.
A minimal example that reads a binary function with type Int -> Int -> Int is:
import Language.Haskell.Interpreter
import System.Environment (getArgs)
main :: IO ()
main = do
args <- getArgs
-- check that head args exists!
errorOrF <- runInterpreter $ do
setImports ["Prelude"]
interpret (head args) (as::Int -> Int -> Int)
case errorOrF of
Left errs -> print errs
Right f -> print $ f 1 2
You can call this program in this way (here I assume the filename with the code is test.hs):
> ghc test.hs
...
> ./test "\\x y -> x + y"
3
The core of the program is runInterpreter, that is where the interpreter interprets the String. We first add the Prelude module to the context with setImports to make available, for example, the + function. Then we call interpret to interpret the first argument as a function and we use as Int -> Int -> Int to enforce the type.
The result of runInterpreter is a Either InterpretError a where a is your type. If the result is Left then you have an error, else you have your function or value. Once you have extracted it from Right, you can use it as you use a Haskell function. See f 1 2 above, for example.
If you want a more complete example you can check haskell-awk, that is my and gelisam project to implement a awk-like command line utility that use Haskell code instead of AWK code. We use Language.Haskell.Interpreter to interpret the user function.
The general answer is that, no, you cannot. Functions are very "opaque" in Haskell generally—the only way you can analyze them is to apply arguments to them (or use typeclasses to pull information out of the type, but that's different).
This means it's very difficult to create a dynamic function in any sort of specialized or simplified way. The best you can do is embed a parser, interpreter, and serialization/deserialization mechanism to another language and then parse strings of that language and execute them in the interpreter.
Of course, if your interpreted language is just Haskell (such as what you get using the hint package) then you can do what you're looking for.

Can I create a function in Haskell that will encapsulate reading data from file and returning me a simple list of data?

Consider the code below taken from a working example I've built to help me learn Haskell. This code parses a CSV file containing stock quotes downloaded from Yahoo into a nice simple list of bars with which I can then work.
My question: how can I write a function that will take a file name as its parameter and return an OHLCBarList so that the first four lines inside main can be properly encapsulated?
In other words, how can I implement (without getting all sorts of errors about IO stuff) the function whose type would be
getBarsFromFile :: Filename -> OHLCBarList
so that the grunt work that was being done in the first four lines of main can be properly encapsulated?
I've tried to do this myself but with my limited Haskell knowledge, I'm failing miserably.
import qualified Data.ByteString as BS
type Filename = String
getContentsOfFile :: Filename -> IO BS.ByteString
barParser :: Parser Bar
barParser = do
time <- timeParser
char ','
open <- double
char ','
high <- double
char ','
low <- double
char ','
close <- double
char ','
volume <- decimal
char ','
return $ Bar Bar1Day time open high low close volume
type OHLCBar = (UTCTime, Double, Double, Double, Double)
type OHLCBarList = [OHLCBar]
barsToBarList :: [Either String Bar] -> OHLCBarList
main :: IO ()
main = do
contents :: C.ByteString <- getContentsOfFile "PriceData/Daily/yhoo1.csv" --PriceData/Daily/Yhoo.csv"
let lineList :: [C.ByteString] = C.lines contents -- Break the contents into a list of lines
let bars :: [Either String Bar] = map (parseOnly barParser) lineList -- Using the attoparsec
let ohlcBarList :: OHLCBarList = barsToBarList bars -- Now I have a nice simple list of tuples with which to work
--- Now I can do simple operations like
print $ ohlcBarList !! 0
If you really want your function to have type Filename -> OHLCBarList, it can't be done.* Reading the contents of a file is an IO operation, and Haskell's IO monad is specifically designed so that values in the IO monad can never leave. If this restriction were broken, it would (in general) mess with a lot of things. Instead of doing this, you have two options: make the type of getBarsFromFile be Filename -> IO OHLCBarList — thus essentially copying the first four lines of main — or write a function with type C.ByteString -> OHLCBarList that the output of getContentsOfFile can be piped through to encapsulate lines 2 through 4 of main.
* Technically, it can be done, but you really, really, really shouldn't even try, especially if you're new to Haskell.
Others have explained that the correct type of your function has to be Filename -> IO OHLCBarList, I'd like to try and give you some insight as to why the compiler imposes this draconian measure on you.
Imperative programming is all about managing state: "do certain operations to certain bits of memory in sequence". When they grow large, procedural programs become brittle; we need a way of limiting the scope of state changes. OO programs encapsulate state in classes but the paradigm is not fundamentally different: you can call the same method twice and get different results. The output of the method depends on the (hidden) state of the object.
Functional programming goes all the way and bans mutable state entirely. A Haskell function, when called with certain inputs, will always produce the same output. Simple examples of
pure functions are mathematical operators like + and *, or most of the list-processing functions like map. Pure functions are all about the inputs and outputs, not managing internal state.
This allows the compiler to be very smart in optimising your program (for example, it can safely collapse duplicated code for you), and helps the programmer not to make mistakes: you can't put the system in an invalid state if there is none! We like pure functions.
The exception to the rule is IO. Code that performs IO is impure by definition: you could call getLine a hundred times and never get the same result, because it depends on what the user typed. Haskell handles this using the type system: all impure functions are marred with the IO type. IO can be thought of as a dependency on the state of the real world, sort of like World -> (NewWorld, a)
To summarise: pure functions are good because they are easy to reason about; this is why Haskell makes functions pure by default. Any impure code has to be labelled as such with an IO type signature; this tells the compiler and the reader to be careful with this function. So your function which reads from a file (a fundamentally impure action) but returns a pure value can't exist.
Addendum in response to your comment
You can still write pure functions to operate on data that was obtained impurely. Consider the following straw-man:
main :: IO ()
main = do
putStrLn "Enter the numbers you want me to process, separated by spaces"
line <- getLine
let numberStrings = words line
let numbers = map read numberStrings
putStrLn $ "The result of the calculation is " ++ (show $ foldr1 (*) numbers + 10)
Lots of code inside IO here. Let's extract some functions:
main :: IO ()
main = do
putStrLn "Enter the numbers you want me to process, separated by spaces"
result <- fmap processLine getLine -- fmap :: (a -> b) -> IO a -> IO b
-- runs an impure result through a pure function
-- without leaving IO
putStrLn $ "The result of the calculation is " ++ result
processLine :: String -> String -- look ma, no IO!
processLine = show . calculate . readNumbers
readNumbers :: String -> [Int]
readNumbers = map read . words
calculate :: [Int] -> Int
calculate numbers = product numbers + 10
product :: [Int] -> Int
product = foldr1 (*)
I've pulled logic out of main into pure functions which are easier to read, easier for the compiler to optimise, and more reusable (and so more testable). The program as a whole still lives inside IO because the data is obtained impurely (see the last part of this answer for a more thorough treatment of this argument). Impure data can be piped through pure functions using fmap and other combinators; you should try to put as little logic in main as possible.
Your code does seem to be most of the way there; as others have suggested you could extract lines 2-4 of your main into another function.
In other words, how can I implement (without getting all sorts of errors about IO stuff) the function whose type would be
getBarsFromFile :: Filename -> OHLCBarList
so that the grunt work that was being done in the first four lines of main can be properly encapsulated?
You cannot do this without getting all sorts of errors about IO stuff because this type for getBarsFromFile misses an IO. Probably that's what the errors about IO stuff are trying to tell you. Did you try understanding and fixing the errors?
In your situation, I would start by abstracting over the second to fourth line of your main in a function:
parseBars :: ByteString -> OHLCBarList
And then I would combine this function with getContentsOfFile to get:
getBarsFromFile :: FilePath -> IO OHLCBarList
This I would call in main.

Reading a file into an array of strings

I'm pretty new to Haskell, and am trying to simply read a file into a list of strings. I'd like one line of the file per element of the list. But I'm running into a type issue that I don't understand. Here's what I've written for my function:
readAllTheLines hdl = (hGetLine hdl):(readAllTheLines hdl)
That compiles fine. I had thought that the file handle needed to be the same one returned from openFile. I attempted to simply show the list from the above function by doing the following:
displayFile path = show (readAllTheLines (openFile path ReadMode))
But when I try to compile it, I get the following error:
filefun.hs:5:43:
Couldn't match expected type 'Handle' with actual type 'IO Handle'
In the return type of a call of 'openFile'
In the first argument of 'readAllTheLines', namely
'(openFile path ReadMode)'
In the first argument of 'show', namely
'(readAllTheLines (openFile path ReadMode))'
So it seems like openFile returns an IO Handle, but hGetLine needs a plain old Handle. Am I misunderstanding the use of these 2 functions? Are they not intended to be used together? Or is there just a piece I'm missing?
Use readFile and lines for a better alternative.
readLines :: FilePath -> IO [String]
readLines = fmap lines . readFile
Coming back to your solution openFile returns IO Handle so you have to run the action to get the Handle. You also have to check if the Handle is at eof before reading something from that. It is much simpler to just use the above solution.
import System.IO
readAllTheLines :: Handle -> IO [String]
readAllTheLines hndl = do
eof <- hIsEOF hndl
notEnded eof
where notEnded False = do
line <- hGetLine hndl
rest <- readAllTheLines hndl
return (line:rest)
notEnded True = return []
displayFile :: FilePath -> IO [String]
displayFile path = do
hndl <- openFile path ReadMode
readAllTheLines hndl
To add on to Satvik's answer, the example below shows how you can utilize a function to populate an instance of Haskell's STArray typeclass in case you need to perform computations on a truly random access data type.
Code Example
Let's say we have the following problem. We have lines in a text file "test.txt", and we need to load it into an array and then display the line found in the center of that file. This kind of computation is exactly the sort situation where one would want to use a random access array over a sequentially structured list. Granted, in this example, there may not be a huge difference between using a list and an array, but, generally speaking, list accesses will cost O(n) in time whereas array accesses will give you constant time performance.
First, let's create our sample text file:
test.txt
This
is
definitely
a
test.
Given the file above, we can use the following Haskell program (located in the same directory as test.txt) to print out the middle line of text, i.e. the word "definitely."
Main.hs
{-# LANGUAGE BlockArguments #-} -- See footnote 1
import Control.Monad.ST (runST, ST)
import Data.Array.MArray (newArray, readArray, writeArray)
import Data.Array.ST (STArray)
import Data.Foldable (for_)
import Data.Ix (Ix) -- See footnote 2
populateArray :: (Integral i, Ix i) => STArray s i e -> [e] -> ST s () -- See footnote 3
populateArray stArray es = for_ (zip [0..] es) (uncurry (writeArray stArray))
middleWord' :: (Integral i, Ix i) => i -> STArray s i String -> ST s String
middleWord' arrayLength = flip readArray (arrayLength `div` 2)
middleWord :: [String] -> String
middleWord ws = runST do
let len = length ws
array <- newArray (0, len - 1) "" :: ST s (STArray s Int String)
populateArray array ws
middleWord' len array
main :: IO ()
main = do
ws <- words <$> readFile "test.txt"
putStrLn $ middleWord ws
Explanation
Starting with the top of Main.hs, the ST s monad and its associated function runST allow us to extract pure values from imperative-style computations with in-place updates in a referentially transparent manner. The module Data.Array.MArray exports the MArray typeclass as an interface for instantiating mutable array data types and provides helper functions for creating, reading, and writing MArrays. These functions can be used in conjunction with STArrays since there is an instance of MArray defined for STArray.
The populateArray function is the crux of our example. It uses for_ to "applicatively" loop over a list of tuples of indices and list elements to fill the given STArray with those list elements, producing a value of type () in the ST s monad.
The middleWord' helper function uses readArray to produce a String (wrapped in the ST s monad) that corresponds to the middle element of a given STArray of Strings.
The middleWord function instantiates a new STArray, uses populateArray to fill the array with values from a provided list of strings, and calls middleWord' to obtain the middle string in the array. runST is applied to this whole ST s monadic computation to extract the pure String result.
We finally use our middleWord function in main to find the middle word in the text file "test.txt".
Further Reading
Haskell's STArray is not the only way to work with arrays in Haskell. There are in fact Arrays, IOArrays, DiffArrays and even "unboxed" versions of all of these array types that avoid using the indirection of pointers to simply store "raw" values. There is a page on the Haskell Wikibook on this topic that may be worth some study. Before that, however, looking at the Wikibook page on mutable objects may give you some insight as to why the ST s monad allows us to safely compute pure values from functions that use imperative/destructive operations.
Footnotes
1 The BlockArguments language extension is what allows us to pass a do block directly to a function without any parentheses or use of the function application operator $.
2 As suggested by the Hackage documentation, Ix is a typeclass mainly meant to be used to specify types for indexing arrays.
3 The use of the Integral and Ix type constraints may be a bit of overkill, but it's used to make our type signatures as general as possible.

Dice Game in Haskell

I'm trying to spew out randomly generated dice for every roll that the user plays. The user has 3 rolls per turn and he gets to play 5 turns (I haven't implemented this part yet and I would appreciate suggestions).
I'm also wondering how I can display the colors randomly. I have the list of tuples in place, but I reckon I need some function that uses random and that list to match those colors. I'm struggling as to how.
module Main where
import System.IO
import System.Random
import Data.List
diceColor = [("Black",1),("Green",2),("Purple",3),("Red",4),("White",5),("Yellow",6)]
{-
randomList :: (RandomGen g) -> Int -> g -> [Integer]
random 0 _ = []
randomList n generator = r : randomList (n-1) newGenerator
where (r, newGenerator) = randomR (1, 6) generator
-}
rand :: Int -> [Int] -> IO ()
rand n rlst = do
num <- randomRIO (1::Int, 6)
if n == 0
then doSomething rlst
else rand (n-1) (num:rlst)
doSomething x = putStrLn (show (sort x))
main :: IO ()
main = do
--hSetBuffering stdin LineBuffering
putStrLn "roll, keep, score?"
cmd <- getLine
doYahtzee cmd
--rand (read cmd) []
doYahtzee :: String -> IO ()
doYahtzee cmd = do
if cmd == "roll"
then rand 5 []
else do print "You won"
There's really a lot of errors sprinkled throughout this code, which suggests to me that you tried to build the whole thing at once. This is a recipe for disaster; you should be building very small things and testing them often in ghci.
Lecture aside, you might find the following facts interesting (in order of the associated errors in your code):
List is deprecated; you should use Data.List instead.
No let is needed for top-level definitions.
Variable names must begin with a lower case letter.
Class prerequisites are separated from a type by =>.
The top-level module block should mainly have definitions; you should associate every where clause (especially the one near randomList) with a definition by either indenting it enough not to be a new line in the module block or keeping it on the same line as the definition you want it to be associated with.
do introduces a block; those things in the block should be indented equally and more than their context.
doYahtzee is declared and used as if it has three arguments, but seems to be defined as if it only has one.
The read function is used to parse a String. Unless you know what it does, using read to parse a String from another String is probably not what you want to do -- especially on user input.
putStrLn only takes one argument, not four, and that argument has to be a String. However, making a guess at what you wanted here, you might like the (!!) and print functions.
dieRoll doesn't seem to be defined anywhere.
It's possible that there are other errors, as well. Stylistically, I recommend that you check out replicateM, randomRs, and forever. You can use hoogle to search for their names and read more about them; in the future, you can also use it to search for functions you wish existed by their type.

Resources