Escaping monad IO

Escaping monad IO - haskell

One of the things I like best about Haskell is how the compiler locates the side effects via the IO monad in function signatures. However, it seems easy to bypass this type check by importing 2 GHC primitives :
{-# LANGUAGE MagicHash #-}
import GHC.Magic(runRW#)
import GHC.Types(IO(..))
hiddenPrint :: ()
hiddenPrint = case putStrLn "Hello !" of
IO sideEffect -> case runRW# sideEffect of
_ -> ()
hiddenPrint is of type unit, but it does trigger a side effect when called (it prints Hello). Is there a way to forbid those hidden IOs (other than trusting no one imports GHC's primitives) ?

This is the purpose of Safe Haskell. If you add {-# language Safe #-} to the top of your source file, you will only be allowed to import modules that are either inferred safe or labeled {-# language Trustworthy #-}. This also imposes some mild restrictions on overlapping instances.

There are many ways in which you can break purity in Haskell. However, you have to go out of your way to find them. Here are a few:
Importing GHC internal modules to access low-level primitives
Using the FFI to import a C function as a pure one, when it is not pure
Using unsafePerformIO
Using unsafeCoerce
Using Ptr a and dereference pointers not pointing to valid data
Declaring your own Typeable instances, and lying so that you can abuse cast. (No longer possible in recent GHC)
Using the unsafe array operations
All these, and others, are not meant to be used regularly. They are there so that, if one is really, really sure, can tell the compiler "I know what I am doing, don't get in my way". Doing so, you take the burden of the proof -- proving that what you are doing is safe.

Related

Can I avoid explicitly deriving built-in Haskell typeclasses over and over again?

I have a large type hierarchy in Haskell.
Counting family instances, which (can) have separate class membership after all, there are hundreds of data types.
Since the top-most type needs to implement built-in classes like Generic,Eq,Ord,Show, every single type in the hierarchy has to as well for a meaningful implementation overall. So my specification contains hundreds of times deriving (Generic,Eq,Ord,Show), which I would like to avoid cluttering the files.
A solution involving a single typeclass to attach everywhere like deriving GEOS with a single automatic derivation from that to the usual set in a centralized place would already help a lot with readability.
Another question asking for similar conciseness in constraints is solved by using constraint synonyms (so my GEOS would be not just linked to but explicitly made up of exactly the classes I want), which however apparently are currently prevented from being instantiated.
(A side question of mine would be why that is so. It seems to me like the reason #simonpj gives about the renamer not knowing what the type checker knows the synonym really to be would only apply to explicitly written out instance implementations.)
Maybe GHC.Generic itself (and something like generic-deriving) could help here?

You could of course use Template Haskell, to generate the deriving-clauses as -XStandaloneDeriving.
{-# LANGUAGE QuasiQuotes #-}
module GEOSDerive where
import Language.Haskell.TH
import Control.Monad
import GHC.Generics
deriveGEOS :: Q Type -> DecsQ
deriveGEOS t = do
t' <- t
forM [ [t|Generic|], [t|Eq|], [t|Ord|], [t|Show|] ] $ \c -> do
c' <- c
return $ StandaloneDerivD Nothing [] (AppT c' t')
Then,
{-# LANGUAGE TemplateHaskell, StandaloneDeriving, QuasiQuotes, DeriveGeneric #-}
import GEOSDerive
data Foo = Foo
deriveGEOS [t|Foo|]
But, I find it somewhat dubious that you need so many types in the first place, or rather that you have so many types but each of them has so little code associated with it that you're bothered about mentioning those four classes for each of them. It's not like there's anything to be concerned about regarding refactoring or so with those, so I'd rather recommend simply keeping with deriving (Generic, Eq, Ord, Show) for each of them.

Has anyone ever compiled a list of the imports needed to avoid the "not polymorphic enough" definitions in Haskell's standard libraries?

I have been using Haskell for quite a while. The more I use it, the more I fall in love with the language. I simply cannot believe I have spent almost 15 years of my life using other languages.
However, I am slowly but steadily growing fed up with Haskell's standard libraries. My main pet peeve is the "not polymorphic enough" definitions (Prelude.map, Control.Monad.forM_, etc.). I have a lot of Haskell source code files whose first lines look like
{-# LANGUAGE NoMonomorphismRestriction #-}
module Whatever where
import Control.Monad.Error hiding (forM_, mapM_)
import Control.Monad.State hiding (forM_, mapM_)
import Data.Foldable (forM_, mapM_)
{- ... -}
In order to avoid constantly hoogling which definitions I should hide, I would like to have a single or a small amount of source code files that wrap this import boilerplate into manageable units.
So...
Has anyone else tried doing this before?
If the answer to the previous question is "Yes", have they posted the resulting boilerplate-wrapping source code files?

It is not as clear cut as you imagine it to be. I will list all the disadvantages I can think of off the top of my head:
First, there is no limit to how general these functions can get. For example, right now I am writing a library for indexed types that subsumes ordinary types. Every function you mentioned has a more general indexed equivalent. Do I expect everybody to switch to my library for everything? No.
Here's another example. The mapM function defines a higher order functor that satisfies the functor laws in the Kleisli category:
mapM return = return
mapM (f >=> g) = mapM f >=> mapM g
So I could argue that your traversable generalization is the wrong one and instead we should generalize it as just being an instance of a higher order functor class.
Also, check out the category-extras package for some examples of these higher order classes and functions which subsume all your examples.
There is also the issue of performance. Many of thesr more specialized functions have really finely tuned implementations that dramatically help performance. Sometimes classes expose ways to admit more performant versions, but sometimes they don't.
There is also the issue of typeclass overload. I actually prefer to minimize use of typeclasses unless they have sound laws derived from theory rather than convenience. Also, typeclasses generally play poorly with the monomorphism restriction and I enjoy writing functions without signatures for my application code.
There is also the issue of taste. A lot of people simply don't agree what is the best Haskell style. We still can't even agree on the Prelude. Speaking of which, there have been many attempts to write new Preludes, but nobody can agree on what is best so we all default back to the Haskell98 one anyway.
However, I think the overall spirit of improving things is good and the worst enemy of progress is satisfaction, but don't assume there will be a clear-cut right way to do everything.

Is there a list of GHC extensions that are considered 'safe'?

Occasionally, a piece of code I want to write isn't legal without at least one language extension. This is particularly true when trying to implement ideas in research papers, which tend to use whichever spiffy, super-extended version of GHC was available at the time the paper was written, without making it clear which extensions are actually required.
The result is that I often end up with something like this at the top of my .hs files:
{-# LANGUAGE TypeFamilies
, MultiParamTypeClasses
, FunctionalDependencies
, FlexibleContexts
, FlexibleInstances
, UndecidableInstances
, OverlappingInstances #-}
I don't mind that, but often I feel as though I'm making blind sacrifices to appease the Great God of GHC. It complains that a certain piece of code isn't valid without language extension X, so I add a pragma for X. Then it demands that I enable Y, so I add a pragma for Y. By the time this finishes, I've enable three or four language extensions that I don't really understand, and I have no idea which ones are 'safe'.
To explain what I mean by 'safe':
I understand that UndecidableInstances is safe, because although it may cause the compiler to not terminate, as long as the code compiles it won't have unexpected side effects.
On the other hand, OverlappingInstances is clearly unsafe, because it makes it very easy for me to accidentally write code that gives runtime errors.
Is there a list of GHCextensions which are considered 'safe' and which are 'unsafe'?

It's probably best to look at what SafeHaskell allows:
Safe Language
The Safe Language (enabled through -XSafe) restricts things in two different ways:
Certain GHC LANGUAGE extensions are disallowed completely.
Certain GHC LANGUAGE extensions are restricted in functionality.
Below is precisely what flags and extensions fall into each category:
Disallowed completely: GeneralizedNewtypeDeriving, TemplateHaskell
Restricted functionality: OverlappingInstances, ForeignFunctionInterface, RULES, Data.Typeable
See Restricted Features below
Doesn't Matter: all remaining flags.
Restricted and Disabled GHC Haskell Features
In the Safe language dialect we restrict the following Haskell language features:
ForeignFunctionInterface: This is mostly safe, but foreign import declarations that import a function with a non-IO type are be disallowed. All FFI imports must reside in the IO Monad.
RULES: As they can change the behaviour of trusted code in unanticipated ways, violating semantic consistency they are restricted in function. Specifically any RULES defined in a module M compiled with -XSafe are dropped. RULES defined in trustworthy modules that M imports are still valid and will fire as usual.
OverlappingInstances: This extension can be used to violate semantic consistency, because malicious code could redefine a type instance (by containing a more specific instance definition) in a way that changes the behaviour of code importing the untrusted module. The extension is not disabled for a module M compiled with -XSafe but restricted. While M can define overlapping instance declarations, they can only be used in M. If in a module N that imports M, at a call site that uses a type-class function there is a choice of which instance to use (i.e overlapping) and the most specific choice is from M (or any other Safe compiled module), then compilation will fail. It is irrelevant if module N is considered Safe, or Trustworthy or neither.
Data.Typeable: We allow instances of Data.Typeable to be derived but we don't allow hand crafted instances. Derived instances are machine generated by GHC and should be perfectly safe but hand crafted ones can lie about their type and allow unsafe coercions between types. This is in the spirit of the original design of SYB.
In the Safe language dialect we disable completely the following Haskell language features:
GeneralizedNewtypeDeriving: It can be used to violate constructor access control, by allowing untrusted code to manipulate protected data types in ways the data type author did not intend. I.e can be used to break invariants of data structures.
TemplateHaskell: Is particularly dangerous, as it can cause side effects even at compilation time and can be used to access abstract data types. It is very easy to break module boundaries with TH.
I recall having read that the interaction of FunctionalDependencies and UndecidableInstances can also be unsafe, because beyond allowing an unlimited context stack depth UndecidableInstances also lifts the so-called coverage condition (section 7.6.3.2), but I can't find a cite for this at the moment.
EDIT 2015-10-27: Ever since GHC gained support for type roles, GeneralizedNewtypeDeriving is no longer unsafe. (I'm not sure what else might have changed.)

Purity of functions generating ByteString (or any object with ForeignPtr component)

Since a ByteString is a constructor with ForeignPtr:
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
{-# UNPACK #-} !Int                -- offset
{-# UNPACK #-} !Int                -- length
If I have a function that returns ByteString, then given an input, say a constant Word8, the function will return a ByteString with non-deterministic ForeignPtr value - as to what that value will be is determined by the memory manager.
So, does that mean that a function that returns ByteString is not pure? That doesn't seem to be the case obviously, if you have used ByteString and Vector libraries. Surely, it would have been discussed widely if it were the case (and hopefully show up on top of google search). How is that purity enforced?
The reason for asking this question is I am curious what are the subtle points involved in using ByteString and Vector objects, from the GHC compiler perspective, given ForeignPtr member in their constructor.

There is no way to observe the value of the pointer inside the ForeignPtr from outside the Data.ByteString module; its implementation is internally impure, but externally pure, because it makes sure that the invariants required to be pure are maintained as long as you cannot see inside the ByteString constructor — which you can't, because it's not exported.
This is a common technique in Haskell: implementing something with unsafe techniques under the hood, but exposing a pure interface; you get both the performance and power unsafe techniques bring, without compromising Haskell's safety. (Of course, the implementation modules can have bugs, but do you think ByteString would be less likely to leak its abstraction if it was written in C? :))
As far as the subtle points go, if you're talking from a user's perspective, don't worry: you can use any function the ByteString and Vector libraries export without worrying, as long as they don't start with unsafe. They are both very mature and well-tested libraries, so you shouldn't run into any purity problems at all, and if you do, that's a bug in the library, and you should report it.
As far as writing your own code that provides external safety with an unsafe internal implementation, the rule is very simple: maintain referential transparency.
Taking ByteString as an example, the functions to construct ByteStrings use unsafePerformIO to allocate blocks of data, which they then mutate and put in the constructor. If we exported the constructor, then user code would be able to get at the ForeignPtr. Is this problematic? To determine whether it is, we need to find a pure function (i.e. not in IO) that lets us distinguish two ForeignPtrs allocated in this way. A quick glance at the documentation shows that there is such a function: instance Eq (ForeignPtr a) would let us distinguish these. So we must not allow user code to access the ForeignPtr. The easiest way to do this is to not export the constructor.
In summary: When you use an unsafe mechanism to implement something, verify that the impurity it introduces cannot leak outside of the module, e.g. by inspecting the values you produce with it.
As far as compiler issues go, you shouldn't really have to worry about them; while the functions are unsafe, they shouldn't allow you to do anything more dangerous, beyond violating purity, than you can do in the IO monad to start with. Generally, if you want to do something that could produce really unexpected results, you'll have to go out of your way to do so: for instance, you can use unsafeDupablePerformIO if you can deal with the possibility of two threads evaluating the same thunk of the form unsafeDupablePerformIO m simultaneously. unsafePerformIO is slightly slower than unsafeDupablePerformIO because it prevents this from happening. (Thunks in your program can be evaluated by two threads simultaneously during normal execution with GHC; this is normally not a problem, as evaluating the same pure value twice should have no adverse side-effects (by definition), but when writing unsafe code, it's something you have to take into account.)
The GHC documentation for unsafePerformIO (and unsafeDupablePerformIO, as I linked above) details some pitfalls you might run into; similarly the documentation for unsafeCoerce# (which should be used through its portable name, Unsafe.Coerce.unsafeCoerce).

TemplateHaskell and IO

Is there any proper way to make TH's functions safe if they use side effects? Say, I want to have a function that calls git in compile time and generates a version string:
{-# LANGUAGE TemplateHaskell #-}
module Qq where
import System.Process
import Language.Haskell.TH
version = $( [| (readProcess "git" ["rev-parse", "HEAD"] "") |] )
the type of version is IO String. But version is completely free of side effects in runtime,
it has side effects only in compile time. Is there any way to make it pure in runtime without using unsafePerformIO ?

First: normally, the runtime type of the generated code is independent of the compile-time type of the Template Haskell subexpressions, so the runtime type doesn't have to be in IO.
Now, to run this command without using unsafePerformIO, use runIO. You will then have to construct the Expr yourself, without using [| |] (this also solves the type problem).
Actually, if you use [| |] to insert an IO computation, I think it will only insert the computation, not run it, anyway. But that's an irrelevant aside, because regardless of what it does, that's not the right way to do what you want to do.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string