Haskell: use of unsafePerformIO for global constant bindings - haskell

There are lots of discussions of using unsafePerformIO carefully for global mutable variables, and some language additions to support it (e.g. Data.Global). I have a related but distinct question: using it for global constant bindings. Here’s a usage I consider entirely OK: command-line parsing.
module Main where
--------------------------------------------------------------------------------
import Data.Bool (bool)
import Data.Monoid ((<>))
import Options.Applicative (short, help, execParser, info, helper, fullDesc,
progDesc, long, switch)
import System.IO.Unsafe (unsafePerformIO)
--------------------------------------------------------------------------------
data CommandLine = CommandLine
Bool --quiet
Bool --verbose
Bool --force
commandLineParser = CommandLine
<$> switch
( long "quiet"
<> short 'q'
<> help "Show only error messages.")
<*> switch
( long "verbose"
<> short 'v'
<> help "Show lots of detail.")
<*> switch
( long "force"
<> short 'f'
<> help "Do stuff anyway.")
{- Parse the command line, and bind related values globally for
convenience. This use of unsafePerformIO is OK since the action has no
side effects and it's idempotent. -}
CommandLine cQuiet cVerbose cForce
= unsafePerformIO . execParser $ info (helper <*> commandLineParser)
( fullDesc
<> progDesc "example program"
)
-- Print a message:
say = say' $ not cQuiet -- unless --quiet
verbose = say' cVerbose -- if --verbose
say' = bool (const $ return ()) putStrLn
--------------------------------------------------------------------------------
main :: IO ()
main = do
verbose "a verbose message"
say "a regular message"
It is very valuable to be able to refer cQuiet, cVerbose, etc. globally rather than have to pass them around as arguments wherever they’re needed. After all, this is exactly what global identifiers are for: these have a single value that never changes during any run of the program — it just happens that the value is initialized from the outside world rather than declared in the program text.
It makes sense in principal to do the same thing with other sorts of constant data fetched from the outside, e.g. settings from a configuration file — but then an extra point arises: the action which fetches those is not idempotent, unlike reading the command line (I’m slightly abusing the term “idempotent” here, but trust that I’m understood). This just adds the constraint that the action must be performed only once. My question is: what’s the best way to do that with code of this form:
data Config = Foo String | Bar (Maybe String) | Baz Int
readConfig :: IO Config
readConfig = do …
Config foo bar baz = unsafePerformIO readConfig
The doc suggests to me that this is sufficient and none of the precautions mentioned there are needed, but I’m not sure. I’ve seen proposals for adding a top-level syntax inspired by do-notation specifically for such situations:
Config foo bar baz <- readConfig
… which seems like a very good idea; I’d rather be sure the action will be performed at most once than rely on various compiler settings and hope no compiler behavior comes along that breaks existing code.
I feel the fact that these are in fact constants, together with the ugliness involved in passing such things around explicitly despite the fact that they never change, argue strongly for there being a safe and supported way to do this. I’m open to hearing contrary opinions if someone thinks I’m missing an important point here, though.
Updates
The say and verbose uses in the example are not the best, because it’s not values in the IO monad that are the real annoyance — these could easily read the parameters from a global IORef. The problem is the use of such parameters pervasively in pure code, which have to all be rewritten to either take the parameters explicitly (even though these do not change and thus should not need to be function parameters), or be converted to IO which is even worse. I’ll improve the example when I have time.
Another way to think about this: the class of behaviors I’m talking about could be obtained in the following clunky way: run a program that fetches some data via I/O; take the results and substitute them into the template text of the main program as the values of some global bindings; then compile and run the resulting main program. You would then safely have the advantage of referring to those constants easily throughout the program. It seems that it should not be so hard to implement this pattern directly. I phrased the question mentioning unsafePerformIO, but really I’m interested in understanding this kind of behavior, and what the best way to obtain it would be. unsafePerformIO is one way, but it has drawbacks.
known limitations:
With unsafePerformIO, when the data-fetching action happens is not fixed. This may be a feature, so that e.g. an error related to a missing configuration parameter occurs if and only if that parameter is ever actually used. If you need different behavior, you’ll have to force the values with seq as needed.

I don't know if I'd consider top-level command line parsing to always be OK! Specifically, observe what happens with this alternate main when the user provides bad input.
main = do
putStrLn "Arbitrary program initialization"
verbose "a verbose message"
say "a regular message"
putStrLn "Clean shutdown"
> ./commands -x
Arbitrary program initialization
Invalid option `-x'
Usage: ...
Now in this case you can force one (or all!) of the pure values so that the parser is known to have run by a well-defined point in time.
main = do
() <- return $ cQuiet `seq` cVerbose `seq` cForce `seq` ()
-- ...
> ./commands -x
Invalid option `-x'
...
But what happens if you have something like—
forkIO (withArgs newArgs action)
The only sensible thing to do is {-# NOINLINE cQuiet #-} and friends, so some of those precautions in System.IO.Unsafe do apply to you. But this is an interesting case to patch over, note that you have given up the ability to run sub-computations with alternate values. An e.g. ReaderT solution using local doesn't have that drawback.
This seems an even larger drawback to me in the case of reading config files, as long running applications usually are reconfigurable without requiring a stop/start cycle. A top-level pure value precludes reconfiguration.
But maybe this is even more clear if you consider the intersection of both your config files and your command line arguments. In many utilities arguments on the command line override values provided in a config file, an impossible behavior given what you have now.
For toys, sure, go hog wild. For anything else, at least make your top-level value an IORef or MVar. There are some ways to still make the non-unsafePerformIO solutions nicer though. Consider—
data Config = Config { say :: String -> IO ()
, verbose :: String -> IO ()
}
mkSay :: Bool -> String -> IO ()
mkSay quiet s | quiet = return ()
| otherwise = putStrLn s
-- In some action...
let config = Config (mkSay quietFlag) (mkVerbose verboseFlag)
compute :: Config -> IO Value
compute config = do
-- ...
verbose config "Debugging info"
-- ...
This also respects the spirit of Haskell function signatures, in that it's now clear (without even needing to consider the open world of IO) that your functions' behavior actually does depend on program configuration.

-XImplicitParams is useful in this situation.
{-# LANGUAGE ImplicitParams #-}
data CommandLine = CommandLine
Bool --quiet
Bool --verbose
Bool --force
say' :: Bool -> String -> IO ()
say' = bool (const $ return ()) putStrLn
say, verbose :: (?cmdLine :: CommandLine) => String -> IO ()
say = case ?cmdLine of CommandLine cQuiet _ _ -> say' $ not cQuiet
verbose = case ?cmdLine of CommandLine _ cVerbose _ -> say' cVerbose
Anything that is implicitly typed and uses say or verbose will have the ?cmdLine :: CommandLine implicit parameter added to its type.
:type (\s -> say (show s))
(\s -> say (show s))
:: (Show a, ?cmdLine::CommandLine) => a -> IO ()

Two cases from Hackage that come to mind:
The package cmdargs makes use of unsafePerformIO - treating command line arguments as constant.
In the package oeis, the
"pure" function getSequenceByID uses unsafePerformIO to return content from a web page on http://oeis.org. It notes in its documentation:
Note that the result is not in the IO monad, even though the implementation requires looking up information via the Internet. There are no side effects to speak of, and from a practical point of view the function is referentially transparent (OEIS A-numbers could change in theory, but it's extremely unlikely).

Related

Create an IO string

For some reason getLine is not working in my Jupiter notebook. Is there any way to create artificially an IO string so I can continue with some examples that need that?
I tried something like this:
main = do
foo :: IO String
foo << "sdf"
But didn't work obviously. Any way to do that?
Thanks!
A value of type IO String represents an action that may do some IO and produce a String if it’s executed (ultimately, from main). Normally, getLine is such an action that reads a line of text from “standard input” (stdin). If you just want to make your own action to use instead of getLine, you can construct one quite simply using pure :: (Applicative f) => a -> f a where f is IO and a is String in this case:
fakeGetLine :: IO String
fakeGetLine = pure "always the same string"
Aside: pure is also called return :: (Monad m) => a -> m a for historical reasons, but there’s no need to use it, since it’s the same function with a longer name and a slightly more restricted type.
Now, anywhere you would’ve entered getLine, you can replace it with fakeGetLine, or an expression such as pure "whatever string you wish to use". Likewise, if you needed a constant IO Int, you could use pure 123 or pure (123 :: Int).
You can’t replace the standard getLine, but it is possible to use your own definition with the same name of getLine, provided that you hide the default one from scope using an import…hiding declaration:
import Prelude hiding (getLine)
getLine :: IO String
getLine = pure "always the same string"
Alternatively, you can in fact read from stdin in an IPython notebook, as long as you open it with a frontend that supports the “input request” feature, such as JupyterLab or Deepnote. To the best of my knowledge, this is not currently supported in Hydrogen, nteract, or Spyder.
Finally, a method sometimes recommended for this issue in Python, which will not work in Haskell, is attempting to replace stdin with a different handle. That technique is possible in Python because its stdin is a mutable variable, while in Haskell, stdin (like all variables) is immutable.

Is it possible to store haskell "operational" or "free monad" continuation to disk?

I have some simple primitive operations, for example:
In case of operational monad:
import Control.Monad.Operational
type Process a = Program ProcessI a
data ProcessI a where
GetInput :: ProcessI String
Dump :: String -> ProcessI ()
getInput :: Process String
getInput = singleton GetInput
dump :: String -> Process ()
dump = singleton . Dump
Or in case of free monad:
import Control.Monad.Free
type Process = Free ProcessF
data ProcessF a
= GetInput (String -> a)
| Dump String a
deriving (Functor)
getInput :: Process String
getInput = liftF $ GetInput id
dump :: String -> Process ()
dump s = liftF $ Dump s ()
Simple action is the same in both cases, for example:
proc1 :: Process ()
proc1 = forever $ do
a <- getInput
b <- getInput
dump $ a ++ b
dump $ b ++ a
My question is: Is it possible to interpret the process (proc1) in such a way that a continuation in certain step is serialized to disk, then restored during the next program execution? Could you please give an example?
If it's not possible, what would be the closest workaround?
I would like to start the program only when the next input is available, apply the continuation the the input, then interpret until next "getInput" and exit.
I could imagine the scenario to log all inputs, then replay them to get the system to the same state before proceeding, but in this case, the log would grow without limit. I could not find any way to campact the log in the interpreter since there is no possibility to compare continuations (no EQ instance) and the process is infinite.
As I see it, there are two problems:
continuations can contain arbitrary data types
continuations can contain functions (ie closures)
Especially given the second constraint, there's probably no easy way to do exactly what you want.
The discussion on Can Haskell functions be serialized? points to a library called packman. From the Readme:
...the functionality can be used to optimise programs by memoisation (across different program runs), and to checkpoint program execution in selected places. Both uses are exemplified in the slide set linked above.
(The slides it mentions, I think.)
The limitation of this approach is that not all types of data can (or should!) be serialized, notably mutable types like IORef, MVar and STM-related types, and sometimes these end up in thunks and closures leading to runtime exceptions.
Additionally, the library relies on the serialized continuation being taken up by the same binary that created it which may or may not be a real problem for your application.
So you can either get more or less what you want with a slightly limited and complex approach like packman or you can write your own custom logic that serializes to and from a custom type that captures all the information you care about.

Is it possible to 'read' a function in Haskell?

new user, semi-noobie Haskell programmer here. I've been looking through 'Write yourself a Scheme in 48 hours' and it occurred to me that, though it would be extremely unsafe in practice, it would be interesting to see if a Haskell program could 'read' a function.
For example, read "+" :: Num a => a -> a -> a -- (that is the type of (+) )
The above example did not work, however. Any ideas? I know this is a really dumb thing to do in practice, but it would be really cool if it were possible, right?
Haskell is a static and compiled language and you can interpret a string as a function by using Language.Haskell.Interpreter.
A minimal example that reads a binary function with type Int -> Int -> Int is:
import Language.Haskell.Interpreter
import System.Environment (getArgs)
main :: IO ()
main = do
args <- getArgs
-- check that head args exists!
errorOrF <- runInterpreter $ do
setImports ["Prelude"]
interpret (head args) (as::Int -> Int -> Int)
case errorOrF of
Left errs -> print errs
Right f -> print $ f 1 2
You can call this program in this way (here I assume the filename with the code is test.hs):
> ghc test.hs
...
> ./test "\\x y -> x + y"
3
The core of the program is runInterpreter, that is where the interpreter interprets the String. We first add the Prelude module to the context with setImports to make available, for example, the + function. Then we call interpret to interpret the first argument as a function and we use as Int -> Int -> Int to enforce the type.
The result of runInterpreter is a Either InterpretError a where a is your type. If the result is Left then you have an error, else you have your function or value. Once you have extracted it from Right, you can use it as you use a Haskell function. See f 1 2 above, for example.
If you want a more complete example you can check haskell-awk, that is my and gelisam project to implement a awk-like command line utility that use Haskell code instead of AWK code. We use Language.Haskell.Interpreter to interpret the user function.
The general answer is that, no, you cannot. Functions are very "opaque" in Haskell generally—the only way you can analyze them is to apply arguments to them (or use typeclasses to pull information out of the type, but that's different).
This means it's very difficult to create a dynamic function in any sort of specialized or simplified way. The best you can do is embed a parser, interpreter, and serialization/deserialization mechanism to another language and then parse strings of that language and execute them in the interpreter.
Of course, if your interpreted language is just Haskell (such as what you get using the hint package) then you can do what you're looking for.

Is there something better than unsafePerformIO for this....?

I've so far avoided ever needing unsafePerformIO, but this might have to change today.... I would like to see if the community agrees, or if someone has a better solution.
I have a library which needs to use some config data stored in a bunch of files. This data is guaranteed static (during the run), but needs to be in files that can (on very rare occasions) be edited by an end user who can not compile Haskell programs. (The details are uninportant, but think of "/etc/mime.types" as a pretty good approximation. It is a large almost static data file used throughout many programs).
If this weren't a library I would just use the IO monad.... But because it is a library which is called throughout my code, it literally forces a bubbling up of the IO monad through pretty much everything I have written in multiple modules! Although I need to do a one time read of the data files, this low level call is effetively pure, so this is a pretty unacceptable outcome.
FYI, I plan to also wrap the call in unsafeInterleaveIO, so that only files that are needed will be loaded. My code will look something like this....
dataDir="<path to files>"
datafiles::[FilePath]
datafiles =
unsafePerformIO $
unsafeInterleaveIO $
map (dataDir </>)
<$> filter (not . ("." `isPrefixOf`))
<$> getDirectoryContents dataDir
fileData::[String]
fileData = unsafePerformIO $ unsafeInterleaveIO $ sequence $ readFile <$> datafiles
Given that the data read is referentially transparent, I am pretty sure that unsafePerformIO is safe (this has been discussed in many place, such as "Use of unsafePerformIO appropriate?"). Still, though, if there is a better way, I would love to hear about it.
UPDATE-
In response to Anupam's comment....
There are two reasons why I can't break up the lib into IO and non IO parts.
First, the amount of data is large, and I don't want to read it all into memory at once. Remember that IO is always read strictly.... This is the reason that I need to put in the unsafeInterleaveIO call, to make it lazy. IMHO, once you use unsafeInterleaveIO, you might as well use unsafePerformIO, as the risk is already there.
Second, breaking out the IO specific parts just substitutes the bubbling up of the IO monad with the bubbling up of the IO read code, as well as the passing around of the data (I might actually choose to pass around the data using the state monad anyway, so it really isn't an improvement to substitute the IO monad for the state monad everywhere). This wouldn't be so bad if the low level function itself wasn't effectively pure (ie- think of my /etc/mime.types example above, and imagine a Haskell extensionToMimeType function, which is basically pure, but needs to get the database data from the file.... Suddenly everything from low to high in the stack needs to call or pass through a readMimeData::IO String. Why should each main even need to care about the library choice of a submodule many levels deep?).
I agree with Anupam Jain, you would be better off reading these data files at a somewhat higher level, in IO, and then passing the data in them through the rest of your program purely.
You could, for example, put the functions that need the results of fileData into Reader [String], so that they can just ask for the results as needed (or some Reader Config, where Config holds these strings and whatever else you need).
A sketch of what I'm suggesting follows:
type AppResult = String
fileData :: IO [String]
fileData = undefined -- read the files
myApp :: String -> Reader [String] AppResult
myApp s = do
files <- ask
return undefined -- do whatever with s and config
main = do
config <- fileData
return $ runReader (myApp "test") config
I gather that you don't want to read all the data at once, because that would be costly. And maybe you don't really know up-front what files you will need to load, so loading all of them at the start would be wasteful.
Here's an attempt at a solution. It requires you to work inside a free monad and relegate the side-effecting operations to an interpreter. Some preliminary imports:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.ByteString as B
import Data.Monoid
import Data.List
import Data.Functor.Compose
import Control.Applicative
import Control.Monad
import Control.Monad.Free
import System.IO
We define a functor for the free monad. It will offer a value p do the interpreter and continue the computation after receiving a value b:
type LazyLoad p b = Compose ((,) p) ((->) b)
A convenience function to request the loading of a file:
lazyLoad :: FilePath -> Free (LazyLoad FilePath B.ByteString) B.ByteString
lazyLoad path = liftF $ Compose (path,id)
A dummy interpreter function that reads "file contents" from stdin:
interpret :: Free (LazyLoad FilePath B.ByteString) a -> IO a
interpret = iterM $ \(Compose (path,next)) -> do
putStrLn $ "Enter the contents for file " <> path <> ":"
B.hGetLine stdin >>= next
Some silly example functions:
someComp :: B.ByteString -> B.ByteString
someComp b = "[" <> b <> "]"
takesAwhile :: Int
takesAwhile = foldl' (+) 0 $ take 400000000 $ intersperse (negate 1) $ repeat 1
An example program:
main :: IO ()
main = do
r <- interpret $ do
r1 <- someComp <$> lazyLoad "file1"
r2 <- return takesAwhile
if (r2 == 1)
then return r1
else someComp <$> lazyLoad "file2"
putStrLn . show $ r
When executed, this program will request a line, spend some time computing takesAwhile and only then request another line.
If want to allow different kinds of "requests", this solution could be extended with something like Data types à la carte so that each function only needs to know about about the precise effects it requires.
If you are content with allowing only one type of request, you could also use Clients and Servers from Pipes.Core instead of the free monad.

Monad Transformers vs Passing parameters to functions

I am new to Haskell but understand how Monad Transformers can be used.
Yet, I still have difficulties grabbing their claimed advantage over passing parameters to function calls.
Based on the wiki Monad Transformers Explained, we basically have a Config Object defined as
data Config = Config Foo Bar Baz
and to pass it around, instead of writing functions with this signature
client_func :: Config -> IO ()
we use a ReaderT Monad Transformer and change the signature to
client_func :: ReaderT Config IO ()
pulling the Config is then just a call to ask.
The function call changes from client_func c to runReaderT client_func c
Fine.
But why does this make my application simpler ?
1- I suspect Monad Transformers have an interest when you stitch a lot of functions/modules together to form an application. But this is where is my understanding stops. Could someone please shed some light?
2- I could not find any documentation on how you write a large modular application in Haskell, where modules expose some form of API and hide their implementations, as well as (partly) hide their own States and Environments from the other modules. Any pointers please ?
(Edit: Real World Haskell states that ".. this approach [Monad Transformers] ... scales to bigger programs.", but there is no clear example demonstrating that claim)
EDIT Following Chris Taylor Answer Below
Chris perfectly explains why encapsulating Config, State,etc... in a Transformer Monad provides two benefits:
It prevents a higher level function from having to maintain in its type signature all the parameters required by the (sub)functions it calls but not required for its own use (see the getUserInput function)
and as a consequence makes higher level functions more resilient to a change of the content of the Transformer Monad (say you want to add a Writer to it to provide Logging in a lower level function)
This comes at the cost of changing the signature of all functions so that they run "in" the Transformer Monad.
So question 1 is fully covered. Thank you Chris.
Question 2 is now answered in this SO post
Let's say that we're writing a program that needs some configuration information in the following form:
data Config = C { logFile :: FileName }
One way to write the program is to explicitly pass the configuration around between functions. It would be nice if we only had to pass it to the functions that use it explicitly, but sadly we're not sure if a function might need to call another function that uses the configuration, so we're forced to pass it as a parameter everywhere (indeed, it tends to be the low-level functions that need to use the configuration, which forces us to pass it to all the high-level functions as well).
Let's write the program like that, and then we'll re-write it using the Reader monad and see what benefit we get.
Option 1. Explicit configuration passing
We end up with something like this:
readLog :: Config -> IO String
readLog (C logFile) = readFile logFile
writeLog :: Config -> String -> IO ()
writeLog (C logFile) message = do x <- readFile logFile
writeFile logFile $ x ++ message
getUserInput :: Config -> IO String
getUserInput config = do input <- getLine
writeLog config $ "Input: " ++ input
return input
runProgram :: Config -> IO ()
runProgram config = do input <- getUserInput config
putStrLn $ "You wrote: " ++ input
Notice that in the high level functions we have to pass config around all the time.
Option 2. Reader monad
An alternative is to rewrite using the Reader monad. This complicates the low level functions a bit:
type Program = ReaderT Config IO
readLog :: Program String
readLog = do C logFile <- ask
readFile logFile
writeLog :: String -> Program ()
writeLog message = do C logFile <- ask
x <- readFile logFile
writeFile logFile $ x ++ message
But as our reward, the high level functions are simpler, because we never need to refer to the configuration file.
getUserInput :: Program String
getUserInput = do input <- getLine
writeLog $ "Input: " ++ input
return input
runProgram :: Program ()
runProgram = do input <- getUserInput
putStrLn $ "You wrote: " ++ input
Taking it further
We could re-write the type signatures of getUserInput and runProgram to be
getUserInput :: (MonadReader Config m, MonadIO m) => m String
runProgram :: (MonadReader Config m, MonadIO m) => m ()
which gives us a lot of flexibility for later, if we decide that we want to change the underlying Program type for any reason. For example, if we want to add modifiable state to our program we could redefine
data ProgramState = PS Int Int Int
type Program a = StateT ProgramState (ReaderT Config IO) a
and we don't have to modify getUserInput or runProgram at all - they'll continue to work fine.
N.B. I haven't type checked this post, let alone tried to run it. There may be errors!

Resources