Is it possible to 'read' a function in Haskell? - haskell

new user, semi-noobie Haskell programmer here. I've been looking through 'Write yourself a Scheme in 48 hours' and it occurred to me that, though it would be extremely unsafe in practice, it would be interesting to see if a Haskell program could 'read' a function.
For example, read "+" :: Num a => a -> a -> a -- (that is the type of (+) )
The above example did not work, however. Any ideas? I know this is a really dumb thing to do in practice, but it would be really cool if it were possible, right?

Haskell is a static and compiled language and you can interpret a string as a function by using Language.Haskell.Interpreter.
A minimal example that reads a binary function with type Int -> Int -> Int is:
import Language.Haskell.Interpreter
import System.Environment (getArgs)
main :: IO ()
main = do
args <- getArgs
-- check that head args exists!
errorOrF <- runInterpreter $ do
setImports ["Prelude"]
interpret (head args) (as::Int -> Int -> Int)
case errorOrF of
Left errs -> print errs
Right f -> print $ f 1 2
You can call this program in this way (here I assume the filename with the code is test.hs):
> ghc test.hs
...
> ./test "\\x y -> x + y"
3
The core of the program is runInterpreter, that is where the interpreter interprets the String. We first add the Prelude module to the context with setImports to make available, for example, the + function. Then we call interpret to interpret the first argument as a function and we use as Int -> Int -> Int to enforce the type.
The result of runInterpreter is a Either InterpretError a where a is your type. If the result is Left then you have an error, else you have your function or value. Once you have extracted it from Right, you can use it as you use a Haskell function. See f 1 2 above, for example.
If you want a more complete example you can check haskell-awk, that is my and gelisam project to implement a awk-like command line utility that use Haskell code instead of AWK code. We use Language.Haskell.Interpreter to interpret the user function.

The general answer is that, no, you cannot. Functions are very "opaque" in Haskell generally—the only way you can analyze them is to apply arguments to them (or use typeclasses to pull information out of the type, but that's different).
This means it's very difficult to create a dynamic function in any sort of specialized or simplified way. The best you can do is embed a parser, interpreter, and serialization/deserialization mechanism to another language and then parse strings of that language and execute them in the interpreter.
Of course, if your interpreted language is just Haskell (such as what you get using the hint package) then you can do what you're looking for.

Related

How can I get the string of Haskell code (along with the value)

I'd like to get both the string and value of arbitrary Haskell code. For example:
f (1+1) -> (2,"1+1")
The reason I want to do this is because I'm writing a programming language and I'd like to provide an option to interpret the code (for fast running, i.e. scripting) or compile it to Haskell code (for efficient runtime). So for each builtin I only want to provide the implementation once. That is I don't want to say
plusop = ((+),"(+)")
I have some ideas involving reading the raw haskell source or a separate script that generates a compiler, but these seem much less elegant that what would be done if this question is possible.
It looks like QuasiQuotation could make this possible, but I can't figure out how to get the Haskell value of the expression if I use it (I can only get the String).
Is it possible? How can it be done?
I don't know exactly what you want to do, but here's an example of using Template Haskell to do something similar to your example:
-- TH.hs
{-# LANGUAGE TemplateHaskell #-}
module TH where
import Language.Haskell.TH.Syntax
import Language.Haskell.TH.Ppr
showAndRun :: Q Exp -> Q Exp
showAndRun m = do
x <- m
let s = pprint x
[| ($m, s) |]
-- Main.hs
{-# LANGUAGE TemplateHaskell #-}
import TH
main :: IO ()
main = print $(showAndRun [| 1 + 1 |])
$ runhaskell Main.hs
(2,"1 GHC.Num.+ 1")
I don't know how to pretty print the expression without the qualified GHC.Num prefix. You could try copying the implementation of Language.Haskell.TH.Ppr and change it in the necessary places. Or the easiest is perhaps just a post-processing step where you strip each word that starts with a capital letter and ends in a ..

Haskell: Can a function be compiled?

Consider a simple Haskell Brainf*ck interpreter. Just look at the interpret function.
import Prelude hiding (Either(..))
import Control.Monad
import Data.Char (ord, chr)
-- function in question
interpret :: String -> IO ()
interpret strprog = let (prog, []) = parse strprog
in execBF prog
interpretFile :: FilePath -> IO ()
interpretFile fp = readFile fp >>= interpret
type BF = [BFInstr]
data BFInstr = Left | Right | Inc | Dec | Input | Output | Loop BF
type Tape = ([Integer], [Integer])
emptyTape = (repeat 0, repeat 0)
execBFTape :: Tape -> BF -> IO Tape
execBFTape = foldM doBF
execBF :: BF -> IO ()
execBF prog = do
execBFTape emptyTape prog
return ()
doBF :: Tape -> BFInstr -> IO Tape
doBF ((x:lefts), rights) Left = return (lefts, x:rights)
doBF (lefts, (x:rights)) Right = return (x:lefts, rights)
doBF (left, (x:rights)) Inc = return (left, (x+1):rights)
doBF (left, (x:rights)) Dec = return (left, (x-1):rights)
doBF (left, (_:rights)) Input = getChar >>= \c -> return (left, fromIntegral (ord c):rights)
doBF t#(_, (x: _)) Output = putChar (chr (fromIntegral x)) >> return t
doBF t#(left, (x: _)) (Loop bf) = if x == 0
then return t
else do t' <- execBFTape t bf
doBF t' (Loop bf)
simpleCommands = [('<', Left),
('>', Right),
(',', Input),
('.', Output),
('+', Inc),
('-', Dec)]
parse :: String -> (BF, String)
parse [] = ([], [])
parse (char:prog) = case lookup char simpleCommands of
Just command -> let (rest, prog') = parse prog
in (command : rest, prog')
Nothing ->
case char of
']' -> ([], prog)
'[' -> let (loop, prog') = parse prog
(rest, prog'') = parse prog'
in (Loop loop:rest, prog'')
_ -> parse prog
So I have a function applied like interpret "[->+<]". This gives me an IO () monadic action which executes the given program. It has the right type to be a main of some program.
Let's say I would like to have this action compiled to an executable, that is, I would like to generate an executable file with the result of interpret ... to be the main function. Of course, this executable would have to contain the GHC runtime system (for infinite lists, integer arithmetic etc.).
Questions:
It is my opinion that it is not possible at all to just take the monadic action and save it to be a new file. Is this true?
How could one go about reaching a comparable solution? Do the GHC Api and hint help?
EDIT
Sorry, I oversimplified in the original question. Of course, I can just write a file like this:
main = interpret "..."
But this is not what we usually do when we try to compile something, so consider interpretFile :: FilePath -> IO () instead. Let the BF program be saved in a file (helloworld.bf).
How would I go about creating an executable which executes the contents of helloworld.bf without actually needing the file?
$ ./MyBfCompiler helloworld.bf -o helloworld
The answer is basically no.
There are many ways to construct IO values:
Built in functions like putStrLn
Monad operations like return or >>=
Once you have an IO value there are three ways to break it down:
Set main equal to the value
unsafePerformIO
As the return value of an exported C function
All of these break down into converting an IO a into an a. There is no other way to inspect it to see what it does.
Similarly the only thing you can do with functions is put them in variables or call them (or convert them to C function pointers).
There is no sane way to otherwise inspect a function.
One thing you could do which isn’t compiling but is linking is to have your interpreter main function run on some external c string, build that into a static object, and then your “compiler” could make a new object with this C string of the program in it and link that to what you already have.
There is this theory of partial evaluation that says that if you do partial evaluation of a partial evaluator applied to an interpreter applied to some input then what you get is a compiler but ghc is not a sufficiently advanced partial evaluator.
I’m not sure whether you’re asking how you write a compiler that can take as its input a file such as helloworld.bf, or how you compile a Haskell program that runs helloworld.bf.
In the former case, you would want something a little more fleshed out than this:
import System.Environment (getArgs)
main :: IO ()
main = do
(_:fileName:_) <- getArgs
source <- readFile fileName
interpret source
interpret :: String -> IO ()
interpret = undefined -- You can fill in this piddly little detail yourself.
If you want the latter, there are a few different options. First, you can store the contents of your *.bf file in a string constant (or bettter yet, a Text or strict ByteString), and pass that to your interpreter function. I’d be surprised if GHC is optimistic enough to fully inline and expand that call at compile time, but in principle a Haskell compiler could.
The second is to turn Brainfuck into a domain-specific language with operators you define, so that you can actually write something like
interpret [^<,^+,^>,^.]
If you define (^<) and the other operators, the Brainfuck commands will compile to bytecode representing the Brainfuck program.
In this case, there isn’t an obvious benefit over the first approach, but with a more structured language, you can do an optimization pass, compile the source to stack-based bytecode more suitable for an interpreter to execute, or generate a more complex AST.
You might also express this idea as
interpret
(^< ^+ ^> ^.)
input
Here, if the Brainfuck commands are higher-order functions with right-to-left precedence, and interpret bf input = (bf begin) input, the Brainfuck code would simply compile to a function that the interpreter calls. This has the best chance of being turned into fast native code.
Previous Answer
In certain cases, a compiler can inline a function call (there are pragmas in GHC to tell it to do this). The compiler is also more likely to do what you want if you name the closure, such as:
main = interpret foo
In GHC, you can give the compiler a hint by adding
{-# INLINE main #-}
or even
{-# INLINE interpret #-}
You can check what code GHC generated by compiling the module with -S and looking through the source.

Haskell: use of unsafePerformIO for global constant bindings

There are lots of discussions of using unsafePerformIO carefully for global mutable variables, and some language additions to support it (e.g. Data.Global). I have a related but distinct question: using it for global constant bindings. Here’s a usage I consider entirely OK: command-line parsing.
module Main where
--------------------------------------------------------------------------------
import Data.Bool (bool)
import Data.Monoid ((<>))
import Options.Applicative (short, help, execParser, info, helper, fullDesc,
progDesc, long, switch)
import System.IO.Unsafe (unsafePerformIO)
--------------------------------------------------------------------------------
data CommandLine = CommandLine
Bool --quiet
Bool --verbose
Bool --force
commandLineParser = CommandLine
<$> switch
( long "quiet"
<> short 'q'
<> help "Show only error messages.")
<*> switch
( long "verbose"
<> short 'v'
<> help "Show lots of detail.")
<*> switch
( long "force"
<> short 'f'
<> help "Do stuff anyway.")
{- Parse the command line, and bind related values globally for
convenience. This use of unsafePerformIO is OK since the action has no
side effects and it's idempotent. -}
CommandLine cQuiet cVerbose cForce
= unsafePerformIO . execParser $ info (helper <*> commandLineParser)
( fullDesc
<> progDesc "example program"
)
-- Print a message:
say = say' $ not cQuiet -- unless --quiet
verbose = say' cVerbose -- if --verbose
say' = bool (const $ return ()) putStrLn
--------------------------------------------------------------------------------
main :: IO ()
main = do
verbose "a verbose message"
say "a regular message"
It is very valuable to be able to refer cQuiet, cVerbose, etc. globally rather than have to pass them around as arguments wherever they’re needed. After all, this is exactly what global identifiers are for: these have a single value that never changes during any run of the program — it just happens that the value is initialized from the outside world rather than declared in the program text.
It makes sense in principal to do the same thing with other sorts of constant data fetched from the outside, e.g. settings from a configuration file — but then an extra point arises: the action which fetches those is not idempotent, unlike reading the command line (I’m slightly abusing the term “idempotent” here, but trust that I’m understood). This just adds the constraint that the action must be performed only once. My question is: what’s the best way to do that with code of this form:
data Config = Foo String | Bar (Maybe String) | Baz Int
readConfig :: IO Config
readConfig = do …
Config foo bar baz = unsafePerformIO readConfig
The doc suggests to me that this is sufficient and none of the precautions mentioned there are needed, but I’m not sure. I’ve seen proposals for adding a top-level syntax inspired by do-notation specifically for such situations:
Config foo bar baz <- readConfig
… which seems like a very good idea; I’d rather be sure the action will be performed at most once than rely on various compiler settings and hope no compiler behavior comes along that breaks existing code.
I feel the fact that these are in fact constants, together with the ugliness involved in passing such things around explicitly despite the fact that they never change, argue strongly for there being a safe and supported way to do this. I’m open to hearing contrary opinions if someone thinks I’m missing an important point here, though.
Updates
The say and verbose uses in the example are not the best, because it’s not values in the IO monad that are the real annoyance — these could easily read the parameters from a global IORef. The problem is the use of such parameters pervasively in pure code, which have to all be rewritten to either take the parameters explicitly (even though these do not change and thus should not need to be function parameters), or be converted to IO which is even worse. I’ll improve the example when I have time.
Another way to think about this: the class of behaviors I’m talking about could be obtained in the following clunky way: run a program that fetches some data via I/O; take the results and substitute them into the template text of the main program as the values of some global bindings; then compile and run the resulting main program. You would then safely have the advantage of referring to those constants easily throughout the program. It seems that it should not be so hard to implement this pattern directly. I phrased the question mentioning unsafePerformIO, but really I’m interested in understanding this kind of behavior, and what the best way to obtain it would be. unsafePerformIO is one way, but it has drawbacks.
known limitations:
With unsafePerformIO, when the data-fetching action happens is not fixed. This may be a feature, so that e.g. an error related to a missing configuration parameter occurs if and only if that parameter is ever actually used. If you need different behavior, you’ll have to force the values with seq as needed.
I don't know if I'd consider top-level command line parsing to always be OK! Specifically, observe what happens with this alternate main when the user provides bad input.
main = do
putStrLn "Arbitrary program initialization"
verbose "a verbose message"
say "a regular message"
putStrLn "Clean shutdown"
> ./commands -x
Arbitrary program initialization
Invalid option `-x'
Usage: ...
Now in this case you can force one (or all!) of the pure values so that the parser is known to have run by a well-defined point in time.
main = do
() <- return $ cQuiet `seq` cVerbose `seq` cForce `seq` ()
-- ...
> ./commands -x
Invalid option `-x'
...
But what happens if you have something like—
forkIO (withArgs newArgs action)
The only sensible thing to do is {-# NOINLINE cQuiet #-} and friends, so some of those precautions in System.IO.Unsafe do apply to you. But this is an interesting case to patch over, note that you have given up the ability to run sub-computations with alternate values. An e.g. ReaderT solution using local doesn't have that drawback.
This seems an even larger drawback to me in the case of reading config files, as long running applications usually are reconfigurable without requiring a stop/start cycle. A top-level pure value precludes reconfiguration.
But maybe this is even more clear if you consider the intersection of both your config files and your command line arguments. In many utilities arguments on the command line override values provided in a config file, an impossible behavior given what you have now.
For toys, sure, go hog wild. For anything else, at least make your top-level value an IORef or MVar. There are some ways to still make the non-unsafePerformIO solutions nicer though. Consider—
data Config = Config { say :: String -> IO ()
, verbose :: String -> IO ()
}
mkSay :: Bool -> String -> IO ()
mkSay quiet s | quiet = return ()
| otherwise = putStrLn s
-- In some action...
let config = Config (mkSay quietFlag) (mkVerbose verboseFlag)
compute :: Config -> IO Value
compute config = do
-- ...
verbose config "Debugging info"
-- ...
This also respects the spirit of Haskell function signatures, in that it's now clear (without even needing to consider the open world of IO) that your functions' behavior actually does depend on program configuration.
-XImplicitParams is useful in this situation.
{-# LANGUAGE ImplicitParams #-}
data CommandLine = CommandLine
Bool --quiet
Bool --verbose
Bool --force
say' :: Bool -> String -> IO ()
say' = bool (const $ return ()) putStrLn
say, verbose :: (?cmdLine :: CommandLine) => String -> IO ()
say = case ?cmdLine of CommandLine cQuiet _ _ -> say' $ not cQuiet
verbose = case ?cmdLine of CommandLine _ cVerbose _ -> say' cVerbose
Anything that is implicitly typed and uses say or verbose will have the ?cmdLine :: CommandLine implicit parameter added to its type.
:type (\s -> say (show s))
(\s -> say (show s))
:: (Show a, ?cmdLine::CommandLine) => a -> IO ()
Two cases from Hackage that come to mind:
The package cmdargs makes use of unsafePerformIO - treating command line arguments as constant.
In the package oeis, the
"pure" function getSequenceByID uses unsafePerformIO to return content from a web page on http://oeis.org. It notes in its documentation:
Note that the result is not in the IO monad, even though the implementation requires looking up information via the Internet. There are no side effects to speak of, and from a practical point of view the function is referentially transparent (OEIS A-numbers could change in theory, but it's extremely unlikely).

Can I create a function in Haskell that will encapsulate reading data from file and returning me a simple list of data?

Consider the code below taken from a working example I've built to help me learn Haskell. This code parses a CSV file containing stock quotes downloaded from Yahoo into a nice simple list of bars with which I can then work.
My question: how can I write a function that will take a file name as its parameter and return an OHLCBarList so that the first four lines inside main can be properly encapsulated?
In other words, how can I implement (without getting all sorts of errors about IO stuff) the function whose type would be
getBarsFromFile :: Filename -> OHLCBarList
so that the grunt work that was being done in the first four lines of main can be properly encapsulated?
I've tried to do this myself but with my limited Haskell knowledge, I'm failing miserably.
import qualified Data.ByteString as BS
type Filename = String
getContentsOfFile :: Filename -> IO BS.ByteString
barParser :: Parser Bar
barParser = do
time <- timeParser
char ','
open <- double
char ','
high <- double
char ','
low <- double
char ','
close <- double
char ','
volume <- decimal
char ','
return $ Bar Bar1Day time open high low close volume
type OHLCBar = (UTCTime, Double, Double, Double, Double)
type OHLCBarList = [OHLCBar]
barsToBarList :: [Either String Bar] -> OHLCBarList
main :: IO ()
main = do
contents :: C.ByteString <- getContentsOfFile "PriceData/Daily/yhoo1.csv" --PriceData/Daily/Yhoo.csv"
let lineList :: [C.ByteString] = C.lines contents -- Break the contents into a list of lines
let bars :: [Either String Bar] = map (parseOnly barParser) lineList -- Using the attoparsec
let ohlcBarList :: OHLCBarList = barsToBarList bars -- Now I have a nice simple list of tuples with which to work
--- Now I can do simple operations like
print $ ohlcBarList !! 0
If you really want your function to have type Filename -> OHLCBarList, it can't be done.* Reading the contents of a file is an IO operation, and Haskell's IO monad is specifically designed so that values in the IO monad can never leave. If this restriction were broken, it would (in general) mess with a lot of things. Instead of doing this, you have two options: make the type of getBarsFromFile be Filename -> IO OHLCBarList — thus essentially copying the first four lines of main — or write a function with type C.ByteString -> OHLCBarList that the output of getContentsOfFile can be piped through to encapsulate lines 2 through 4 of main.
* Technically, it can be done, but you really, really, really shouldn't even try, especially if you're new to Haskell.
Others have explained that the correct type of your function has to be Filename -> IO OHLCBarList, I'd like to try and give you some insight as to why the compiler imposes this draconian measure on you.
Imperative programming is all about managing state: "do certain operations to certain bits of memory in sequence". When they grow large, procedural programs become brittle; we need a way of limiting the scope of state changes. OO programs encapsulate state in classes but the paradigm is not fundamentally different: you can call the same method twice and get different results. The output of the method depends on the (hidden) state of the object.
Functional programming goes all the way and bans mutable state entirely. A Haskell function, when called with certain inputs, will always produce the same output. Simple examples of
pure functions are mathematical operators like + and *, or most of the list-processing functions like map. Pure functions are all about the inputs and outputs, not managing internal state.
This allows the compiler to be very smart in optimising your program (for example, it can safely collapse duplicated code for you), and helps the programmer not to make mistakes: you can't put the system in an invalid state if there is none! We like pure functions.
The exception to the rule is IO. Code that performs IO is impure by definition: you could call getLine a hundred times and never get the same result, because it depends on what the user typed. Haskell handles this using the type system: all impure functions are marred with the IO type. IO can be thought of as a dependency on the state of the real world, sort of like World -> (NewWorld, a)
To summarise: pure functions are good because they are easy to reason about; this is why Haskell makes functions pure by default. Any impure code has to be labelled as such with an IO type signature; this tells the compiler and the reader to be careful with this function. So your function which reads from a file (a fundamentally impure action) but returns a pure value can't exist.
Addendum in response to your comment
You can still write pure functions to operate on data that was obtained impurely. Consider the following straw-man:
main :: IO ()
main = do
putStrLn "Enter the numbers you want me to process, separated by spaces"
line <- getLine
let numberStrings = words line
let numbers = map read numberStrings
putStrLn $ "The result of the calculation is " ++ (show $ foldr1 (*) numbers + 10)
Lots of code inside IO here. Let's extract some functions:
main :: IO ()
main = do
putStrLn "Enter the numbers you want me to process, separated by spaces"
result <- fmap processLine getLine -- fmap :: (a -> b) -> IO a -> IO b
-- runs an impure result through a pure function
-- without leaving IO
putStrLn $ "The result of the calculation is " ++ result
processLine :: String -> String -- look ma, no IO!
processLine = show . calculate . readNumbers
readNumbers :: String -> [Int]
readNumbers = map read . words
calculate :: [Int] -> Int
calculate numbers = product numbers + 10
product :: [Int] -> Int
product = foldr1 (*)
I've pulled logic out of main into pure functions which are easier to read, easier for the compiler to optimise, and more reusable (and so more testable). The program as a whole still lives inside IO because the data is obtained impurely (see the last part of this answer for a more thorough treatment of this argument). Impure data can be piped through pure functions using fmap and other combinators; you should try to put as little logic in main as possible.
Your code does seem to be most of the way there; as others have suggested you could extract lines 2-4 of your main into another function.
In other words, how can I implement (without getting all sorts of errors about IO stuff) the function whose type would be
getBarsFromFile :: Filename -> OHLCBarList
so that the grunt work that was being done in the first four lines of main can be properly encapsulated?
You cannot do this without getting all sorts of errors about IO stuff because this type for getBarsFromFile misses an IO. Probably that's what the errors about IO stuff are trying to tell you. Did you try understanding and fixing the errors?
In your situation, I would start by abstracting over the second to fourth line of your main in a function:
parseBars :: ByteString -> OHLCBarList
And then I would combine this function with getContentsOfFile to get:
getBarsFromFile :: FilePath -> IO OHLCBarList
This I would call in main.

Is there any way to use IO Bool in if-statement without binding to a name in haskell?

If I've got a function that returns IO Bool (specifically an atomically), is there any way to use the return value directly in the if statement, without binding?
So currently I've got
ok <- atomically $ do
...
if (ok) then do
...
else do
...
Is it at all possible to write this as something like
if (*some_operator_here* atomically $ do
...) then do
...
else do
...
I was hoping there'd be a way to use something like <- anonymously, i.e., if (<- atomically ...) but so far no such luck.
Similarly on getLine, is it possible to write something like
if ((*operator* getLine) == "1234") then do ...
Related addendum--what is the type of (<-)? I can't get it to show up in ghci. I'm assuming it's m a -> a, but then that would mean it could be used outside of a monad to escape that monad, which would be unsafe, right? Is (<-) not a function at all?
You can use ifM from Control.Conditional if that suits your purpose and its not even hard to write a similar function.
Just to give you example
import Control.Conditional
import Control.Monad
(==:) :: ( Eq a,Monad m) => m a -> m a -> m Bool
(==:) = liftM2 (==)
main = ifM (getLine ==: getLine) (print "hit") (print "miss")
I think there are ways using rebindable syntax extension that you can even use if c then e1 else e2 like syntax for ifM but it is not worth the effort to try that.
With GHC 7.6 and the LambdaCase language extension, you can write
{-# LANGUAGE LambdaCase #-}
import System.Directory
main = do
doesFileExist "/etc/passwd" >>= \case
True -> putStrLn "Yes"
False -> putStrLn "No"
It is not exactly if..then..else, but closer enough, does not require binding to the result, and some people (not me) say that if..then..else is bad style in Haskell anyways.
No, you cannot. Well, to be honest, there is a 'hack' that will allow you to at least write code like this and get it to compile, but the results will almost certainly not be what you wanted or expected.
Why is this not possible? Well, for one thing a value of type IO Bool does not in any sense contain a value of type Bool. Rather it is an 'action' that when performed will return a value of type Bool. For another thing, if this were possible, it would allow you to hide side-effects inside what appears to be pure code. This would violate a core principal of Haskell. And Haskell is very principled.

Resources