TemplateHaskell and IO - haskell

Is there any proper way to make TH's functions safe if they use side effects? Say, I want to have a function that calls git in compile time and generates a version string:
{-# LANGUAGE TemplateHaskell #-}
module Qq where
import System.Process
import Language.Haskell.TH
version = $( [| (readProcess "git" ["rev-parse", "HEAD"] "") |] )
the type of version is IO String. But version is completely free of side effects in runtime,
it has side effects only in compile time. Is there any way to make it pure in runtime without using unsafePerformIO ?

First: normally, the runtime type of the generated code is independent of the compile-time type of the Template Haskell subexpressions, so the runtime type doesn't have to be in IO.
Now, to run this command without using unsafePerformIO, use runIO. You will then have to construct the Expr yourself, without using [| |] (this also solves the type problem).
Actually, if you use [| |] to insert an IO computation, I think it will only insert the computation, not run it, anyway. But that's an irrelevant aside, because regardless of what it does, that's not the right way to do what you want to do.

Related

a simple question about a simple http get request of a json string in haskell.. :(

I'm trying to learn haskell, and what better a way than to learn by converting an already existing program I have made over to haskell, since I know how my program works otherwise.
the first step is to make a simple http get request to a link that provides me a JSON string. I have been digging, and dumpster diving through as much haskell documentation as I can, but I am getting the idea that haskell documentation is obviously... not accessible to non-haskell programmers.
I need to take a link- lets say https://aur.archlinux.org/rpc/?v=5&type=search&by=name-desc&arg=brave
and make a get request on that link. In other languages, there seemed to be an easy way to get the body of that response as a STRING. In haskell im wracking my brain with bytestrings? and "you cant do this because main is IO" and etc etc.
I just can't make any sense of it and I want to breach through this accessibility barrier that haskell has because I otherwise love functional programming, I dont wanna program another way!
{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Simple
import qualified Data.ByteString.Char8 as B8
main :: IO ()
main = do
httpBS "https://aur.archlinux.org/rpc/?v=5&type=search&by=name-desc&arg=brave" >>= B8.putStrLn . getResponseBody
Doing this gets the response and outputs it as standard out, but I need to save this as a string, or convert it from a bytestring? to a string so that I can parse it.
If I sound defeated it's because I very much am.
what better a way than to learn by converting an already existing program I have made over to haskell, since I know how my program works otherwise.
This strategy will encourage you to carry over idioms from the original language. I'm guessing your prior language didn't encourage use of monads and wasn't lazy. Perhaps it wasn't even functional, but the point is this can actually be a detrimental start.
I just can't make any sense of it and I want to breach through this accessibility barrier
Learning to read documentation is hard in any unknown language (for me at least). Understanding each character is actually rather important. For example, a ByteString - an array of bytes, or just Bytes if it were better named - isn't a hard concept but the name implies things to people.
If I sound defeated it's because I very much am.
You're so close! Think of the high level:
Get the data with an HTTP GET
Parse the data from the string (or bytes) into a structure. Python uses a dictionary, for example.
You did 1 already, nice work. Rather than do everything in a single line in point free style (composing a bunch of functions with a bunch of operators), let's name our intermediate values and have one concept per line:
#!/usr/bin/env cabal
{- cabal:
build-depends: base, http-conduit, aeson
-}
{-# LANGUAGE OverloadedStrings #-}
I'm using a shebang so I can just chmod +x file.hs and ./file.hs as I develop.
import Network.HTTP.Simple
import qualified Data.Aeson as Aeson
The most common JSON library in Haskell is Aeson, we'll use that to parse the bytes into json much like python's json.loads.
main :: IO ()
main = do
httpRequest <- parseRequest "https://aur.archlinux.org/rpc/?v=5&type=search&by=name-desc&arg=brave"
response <- httpLBS httpRequest
You're start is good. Notice httpBS is a class of functions generically http<SomeTypeOfResult> and not httpGet like you might have seen in Java or Python. The function learns if this is a GET (vs POST etc) and the headers using fields in the Request data type. To get a Request we parse the URL string and just use all of parseRequests defaults (which is HTTP GET).
I did change to getting lazy byte strings (LBS) because I know the Aeson library uses those later on. This is something like an iterator that produces bytestrings (for intuition, not entirely accurate).
Rather than >>= moreFunctions I'm naming the intermediate value response so we can use it and look at each step separately.
let body = getResponseBody response
Extracting the body from the response is just like what you had, except as a separate expression.
let obj = Aeson.decode body :: Maybe Aeson.Object
The big part is to decode the bytes to JSON. This is hopefully familiar since every language under the sun does json decoding to some sort of dictionary/map/object. In Haskell you'll find it less common to decode to a map and more common to define a structure that is explicit in what you expect to have in the JSON then make a custom decoding routine for that type using the FromJSON class - you don't have to do that, it brings in way more concepts than you'll want when just getting started as a beginner.
print obj
I know this doesn't need explained.
Alternative
If you saw the documentation you might have seen (or considered searching the page for) JSON. This can save you lots of time!
#!/usr/bin/env cabal
{- cabal:
build-depends: base, http-conduit, aeson
-}
{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Simple
import qualified Data.Aeson as Aeson
main :: IO ()
main = do
response <- httpJSON =<< parseRequest "https://aur.archlinux.org/rpc/?v=5&type=search&by=name-desc&arg=brave"
let obj = getResponseBody response :: Maybe Aeson.Object
print obj

Escaping monad IO

One of the things I like best about Haskell is how the compiler locates the side effects via the IO monad in function signatures. However, it seems easy to bypass this type check by importing 2 GHC primitives :
{-# LANGUAGE MagicHash #-}
import GHC.Magic(runRW#)
import GHC.Types(IO(..))
hiddenPrint :: ()
hiddenPrint = case putStrLn "Hello !" of
IO sideEffect -> case runRW# sideEffect of
_ -> ()
hiddenPrint is of type unit, but it does trigger a side effect when called (it prints Hello). Is there a way to forbid those hidden IOs (other than trusting no one imports GHC's primitives) ?
This is the purpose of Safe Haskell. If you add {-# language Safe #-} to the top of your source file, you will only be allowed to import modules that are either inferred safe or labeled {-# language Trustworthy #-}. This also imposes some mild restrictions on overlapping instances.
There are many ways in which you can break purity in Haskell. However, you have to go out of your way to find them. Here are a few:
Importing GHC internal modules to access low-level primitives
Using the FFI to import a C function as a pure one, when it is not pure
Using unsafePerformIO
Using unsafeCoerce
Using Ptr a and dereference pointers not pointing to valid data
Declaring your own Typeable instances, and lying so that you can abuse cast. (No longer possible in recent GHC)
Using the unsafe array operations
All these, and others, are not meant to be used regularly. They are there so that, if one is really, really sure, can tell the compiler "I know what I am doing, don't get in my way". Doing so, you take the burden of the proof -- proving that what you are doing is safe.

Is there a "Safe" alternative the Data.String.Utils's 'replace' in Haskell?

I'm unable to mark as "Safe" code containing, for example
import Data.String.Utils (replace)
preproc :: String -> String
preproc s = foldl1 fmap (uncurry replace <$> subs) s
where subs = [("1","a"),("2","bb"),("3","z"),("4","mr")("0","xx")]
because (apparently) Data.String.Utils's is not "safe".
Is there a safe alternative to replace? And why isn't replace safe anyway?
tl;dr: import Data.Text (replace) - if you can live with the more restricted type signature?
1) The Data.String.Utils module is not tagged as safe, although it should be.
2) The Data.String.Utils module is safe. Its wrong to call it "not safe", even if you put quotes around "safe". GHC tells you that the module would be unsafe, because it uses a conservative approach: if it can't prove at compile time that the module is safe, it assumes that it is unsafe. But no matter how loud the compiler complains that the module would be unsafe, it still remains perfectly safe.
3) On the other hand, it would be possible to write a module, export some version of unsafePerformIO, and mark it as "Trustworthy". GHC would think that the module can be safely imported. But in fact, the module is inherently unsafe.
So, what are your options now?
A) Download the source of the package, and modify the modules that you need, and for which you know that they are safe, to include a "Trustworthy" tag at the beginning: {-# LANGUAGE Trustworthy #-}
(You may send a patch to the maintainer, or you may keep it to yourself)
B) You write your own version of replace and mark it as safe.
C) Maybe you can use replace from Data.Text. But that is limited to Text, whereas the other replace function works on arbitrary lists.
At least on Hoogle there are no other methods with a [a] -> [a] -> [a] -> [a] signature for your use-case.

Template Haskell: GHC stage restriction and how to overcome

I have the following code in a module:
{-# LANGUAGE TemplateHaskell #-}
module Alpha where
import Language.Haskell.TH
import Data.List
data Alpha = Alpha { name :: String, value :: Int } deriving (Show)
findName n = find ((== n) . name)
findx obj = sequence [valD pat bod []]
where
nam = name obj
pat = varP (mkName $ "find" ++ nam)
bod = normalB [| findName nam |]
And then I have the following in the main file:
{-# LANGUAGE TemplateHaskell #-}
import Alpha
one = Alpha "One" 1
two = Alpha "Two" 2
three = Alpha "Three" 3
xs = [one, two , three]
findOne = findName "One"
findTwo = findName "Two"
$(findx three) -- This Fails
$(findx (Alpha "Four" 4)) -- This Works
main = putStrLn "Done"
I'd like the $(findx three) to create findThree = findName "Three" for me. But instead, I get this error:
GHC stage restriction: `three'
is used in a top-level splice or annotation,
and must be imported, not defined locally
In the first argument of `findx', namely `three'
In the expression: findx three
How do I overcome this? I would rather not have to define one, two, etc. in a separate file.
Second question is why does $(findx (Alpha "Four" 4)) work without problems?
I'm not very across Template Haskell myself, but based on my limited understanding the problem is that three is in some sense "still being defined" when GHC is trying to compile $(findx three), while all the component pieces of $(findx (Alpha "Four" 4)) are already fully defined.
The fundamental issue is that all the definitions in the same module affect the meaning of each other. This is due to type inference as well as mutual recursion. The definition x = [] could mean lots of different things, depending on the context; it could be binding x to a list of Int, or a list of IO (), or anything else. GHC might have to process the whole module to figure out exactly what it does mean (or that it's actually an error).
The code that Template Haskell emits into the module that's being compiled has to be considered by that analysis. So that means the Template Haskell code has to run before GHC has figured out what the definitions in the module mean, and so logically you can't use any of them.
Things that have been imported from other modules OTOH have already been fully checked when GHC compiled that module. There is no more information that needs to be learned about them by compiling this module. So those can be accessed and used before the compilation of the code in this module.
Another way to think about it: maybe three isn't actually supposed to be of type Alpha. Maybe that was a typo and the constructor should have been Alphz. Normally GHC finds out about those sorts of errors by compiling all the other code in the module that uses three to see whether that introduces an inconsistency or not. But what if that code uses or is used by things that are only emitted by $(findx three)? We don't even know what code that's going to be until we run it, but we can't settle the question of whether three is properly typed until after we run it.
It would of course be possible to lift this restriction a bit in certain cases (I have no idea whether it would be easy or practical). Maybe we could make GHC consider something to be "defined early" if is imported or if it only uses other things that are "defined early" (and perhaps has an explicit type signature). Maybe it could try compiling the module without running the TH code and if it manages to fully typecheck three before it runs into any errors it could feed that into the TH code and then recompile everything. The downside (besides the work involved) would be making it much more complicated to state what the exact restrictions are on what you can pass to Template Haskell.

Is there ever a good reason to use unsafePerformIO?

The question says it all. More specifically, I am writing bindings to a C library, and I'm wondering what c functions I can use unsafePerformIO with. I assume using unsafePerformIO with anything involving pointers is a big no-no.
It would be great to see other cases where it is acceptable to use unsafePerformIO too.
No need to involve C here. The unsafePerformIO function can be used in any situation where,
You know that its use is safe, and
You are unable to prove its safety using the Haskell type system.
For instance, you can make a memoize function using unsafePerformIO:
memoize :: Ord a => (a -> b) -> a -> b
memoize f = unsafePerformIO $ do
memo <- newMVar $ Map.empty
return $ \x -> unsafePerformIO $ modifyMVar memo $ \memov ->
return $ case Map.lookup x memov of
Just y -> (memov, y)
Nothing -> let y = f x
in (Map.insert x y memov, y)
(This is off the top of my head, so I have no idea if there are flagrant errors in the code.)
The memoize function uses and modifies a memoization dictionary, but since the function as a whole is safe, you can give it a pure type (with no use of the IO monad). However, you have to use unsafePerformIO to do that.
Footnote: When it comes to the FFI, you are responsible for providing the types of the C functions to the Haskell system. You can achieve the effect of unsafePerformIO by simply omitting IO from the type. The FFI system is inherently unsafe, so using unsafePerformIO doesn't make much of a difference.
Footnote 2: There are often really subtle bugs in code that uses unsafePerformIO, the example is just a sketch of a possible use. In particular, unsafePerformIO can interact poorly with the optimizer.
In the specific case of the FFI, unsafePerformIO is meant to be used for calling things that are mathematical functions, i.e. the output depends solely on the input parameters, and every time the function is called with the same inputs, it will return the same output. Also, the function shouldn't have side effects, such as modifying data on disk, or mutating memory.
Most functions from <math.h> could be called with unsafePerformIO, for example.
You're correct that unsafePerformIO and pointers don't usually mix. For example, suppose you have
p_sin(double *p) { return sin(*p); }
Even though you're just reading a value from a pointer, it's not safe to use unsafePerformIO. If you wrap p_sin, multiple calls can use the pointer argument, but get different results. It's necessary to keep the function in IO to ensure that it's sequenced properly in relation to pointer updates.
This example should make clear one reason why this is unsafe:
# file export.c
#include <math.h>
double p_sin(double *p) { return sin(*p); }
# file main.hs
{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign.Ptr
import Foreign.Marshal.Alloc
import Foreign.Storable
foreign import ccall "p_sin"
p_sin :: Ptr Double -> Double
foreign import ccall "p_sin"
safeSin :: Ptr Double -> IO Double
main :: IO ()
main = do
p <- malloc
let sin1 = p_sin p
sin2 = safeSin p
poke p 0
putStrLn $ "unsafe: " ++ show sin1
sin2 >>= \x -> putStrLn $ "safe: " ++ show x
poke p 1
putStrLn $ "unsafe: " ++ show sin1
sin2 >>= \x -> putStrLn $ "safe: " ++ show x
When compiled, this program outputs
$ ./main
unsafe: 0.0
safe: 0.0
unsafe: 0.0
safe: 0.8414709848078965
Even though the value referenced by the pointer has changed between the two references to "sin1", the expression isn't re-evaluated, leading to stale data being used. Since safeSin (and hence sin2) is in IO, the program is forced to re-evaluate the expression, so the updated pointer data is used instead.
Obviously if it should never be used, it wouldn't be in the standard libraries. ;-)
There are a number of reasons why you might use it. Examples include:
Initialising global mutable state. (Whether you should ever have such a thing in the first place is a whole other discussion...)
Lazy I/O is implemented using this trick. (Again, whether lazy I/O is a good idea in the first place is debatable.)
The trace function uses it. (Yet again, it turns out trace is rather less useful than you might imagine.)
Perhaps most significantly, you can use it to implement data structures which are referentially transparent, but internally implemented using impure code. Often the ST monad will let you do that, but sometimes you need a little unsafePerformIO.
Lazy I/O can be seen as a special-case of the last point. So can memoisation.
Consider, for example, an "immutable", growable array. Internally you could implement that as a pure "handle" that points to a mutable array. The handle holds the user-visible size of the array, but the actual underlying mutable array is larger than that. When the user "appends" to the array, a new handle is returned, with a new, larger size, but the append is performed by mutating the underlying mutable array.
You can't do this with the ST monad. (Or rather, you can, but it still requires unsafePerformIO.)
Note that it's damned tricky to get this sort of thing right. And the type checker won't catch if it you're wrong. (That's what unsafePerformIO does; it makes the type checker not check that you're doing it correctly!) For example, if you append to an "old" handle, the correct thing to do would be to copy the underlying mutable array. Forget this, and your code will behave very strangely.
Now, to answer your real question: There's no particular reason why "anything without pointers" should be a no-no for unsafePerformIO. When asking whether to use this function or not, the only question of significance is this: Can the end-user observe any side-effects from doing this?
If the only thing it does is create some buffer somewhere that the user can't "see" from pure code, that's fine. If it writes to a file on disk... not so fine.
HTH.
The standard trick to instantiate global mutable variables in haskell:
{-# NOINLINE bla #-}
bla :: IORef Int
bla = unsafePerformIO (newIORef 10)
I also use it to close over the global variable if I want to prevent access to it outside of functions I provide:
{-# NOINLINE printJob #-}
printJob :: String -> Bool -> IO ()
printJob = unsafePerformIO $ do
p <- newEmptyMVar
return $ \a b -> do
-- here's the function code doing something
-- with variable p, no one else can access.
The way I see it, the various unsafe* nonfunctions really should only be used in cases where you want to do something that respects referential transparency but whose implementation would otherwise require augmenting the compiler or runtime system to add a new primitive capability. It's easier, more modular, readable, maintainable and agile to use the unsafe stuff than to have to modify the language implementation for things like that.
FFI work often intrinsically requires you to do this sort of thing.
Sure. You can have a look at a real example here but in general, unsafePerformIO is usable on any pure function that happens to be side effecting. The IO monad may still be needed to track effects (e.g. freeing memory after the value is computed) even when the function is pure (e.g computing a factorial).
I'm wondering what c functions I can use unsafePerformIO with. I assume using unsafePerformIO with anything involving pointers is a big no-no.
Depends! unsafePerformIO will fully perform actions and force out all the laziness, but that doesn't mean it will break your program. In general, Haskellers prefer unsafePerformIO to appear only in pure functions, so you can use it on results of e.g. scientific computations but maybe not file reads.

Resources