Haskell - Safe and Trustworthy extensions - haskell

Looking at some code in hackage I stumbled upon the Safe and Trustworthy extensions.
What do they mean broadly (or also exactly...)? Is there a good rule of thumb on when to use them and when not?

Safe Haskell is in essence a subset of the Haskell language. It aims to disable certain "tricks" that are often used, like for example unsafePerformIO :: IO a -> a, and furthermore it (aims to) guarantee that you do not somehow get access to private functions, data constructors, etc. of a module that aims to prevent access from these. In short safe Haskell guarantees three things:
Referential transparency;
Module boundary control; and
Semantic consistency.
A safe module has to follow these limitations, and furthermore only work with safe modules. A safe module of course does not mean that the code it works with is correct, or that an IO action for example can not be malicious. But it has some guarantees since if a function has a type that has no IO, then normally the module should not be able to perform IO in an unsafe way.
Certain extensions like TemplateHaskell and specifying {-# RULES … #-} pragma's are not allowed in safe Haskell. Other extensions, like DeriveDataTypeable are allowed, but only if one makes use of the deriving clause to generate an instance, and thus not generates a custom one.
Some modules however, need to make use of extensions in order to work properly. In that case, the author can mark the module as Trustworthy. That means that the author claims that the module exposes a safe API, but that it internally need to work with some unsafe extensions, pragmas, etc. The compiler thus can not guarantee safety.
These extensions are documented in the documentation:
The Safe Haskell extension introduces the following three language
flags:
XSafe — Enables the safe language dialect, asking GHC to guarantee trust. The safe language dialect requires that all imports
be trusted or a compilation error will occur.
XTrustworthy — Means that while this module may invoke unsafe functions internally, the module's author claims that it exports an
API that can't be used in an unsafe way. This doesn't enable the
safe language or place any restrictions on the allowed Haskell code.
The trust guarantee is provided by the module author, not GHC. An
import statement with the safe keyword results in a compilation error
if the imported module is not trusted. An import statement without the
keyword behaves as usual and can import any module whether trusted or
not.
XUnsafe — Marks the module being compiled as unsafe so that modules compiled using -XSafe can't import it.

Related

Should I make my Haskell modules Safe by default?

Is there any downside to marking the modules Safe? Should it be the assumed default?
As you can read in the GHC user manual, if you mark your modules Safe, you are restricted to a certain "safe" subset of the Haskell language.
You can only import modules which are also marked Safe (which rules out functions like unsafeCoerce or unsafePerformIO).
You can't use Template Haskell.
You can't use FFI outside of IO.
You can't use GeneralisedNewtypeDeriving.
You can't manually implement the Generic typeclass for the types you define in your module.
etc.
In exchange, the users of the module gain a bunch of guarantees (quoting from the manual):
Referential transparency — The types can be trusted. Any pure function, is guaranteed to be pure. Evaluating them is deterministic
and won’t cause any side effects. Functions in the IO monad are still
allowed and behave as usual. So, for example, the unsafePerformIO ::
IO a -> a function is disallowed in the safe language to enforce this
property.
Module boundary control — Only symbols that are publicly available through other module export lists can be accessed in the safe
language. Values using data constructors not exported by the defining
module, cannot be examined or created. As such, if a module M
establishes some invariants through careful use of its export list,
then code written in the safe language that imports M is guaranteed to
respect those invariants.
Semantic consistency — For any module that imports a module written in the safe language, expressions that compile both with and without
the safe import have the same meaning in both cases. That is,
importing a module written in the safe language cannot change the
meaning of existing code that isn’t dependent on that module. So, for
example, there are some restrictions placed on the use of
OverlappingInstances, as these can violate this property.
Strict subset — The safe language is strictly a subset of Haskell as implemented by GHC. Any expression that compiles in the safe language
has the same meaning as it does when compiled in normal Haskell.
Note that safety is inferred. If your module is not marked with any of Safe, Trustworthy, or Unsafe, GHC will infer the safety of the module to Safe or Unsafe. When you set the Safe flag, then GHC will issue an error if it decides the module is not actually safe. You can also set -Wunsafe, which emits a warning if a module is inferred to be unsafe. If you let it be inferred, your module will continue to compile even if your dependencies' safety statuses change. If you write it out, you promise to your users that the safety status is stable and dependable.
One use case described in the manual refers to running "untrusted" code. If you provide extension points of any kind in your product and you want to make sure that those capabilities aren't used to attack your product, you can require the code for the extension points to be marked Safe.
You can mark your module Trustworthy and this doesn't restrict you in any way while implementing your module. Your module might be used from Safe code then and it is your responsibility to not violate the guaranties, that should be given by Safe code. So this is a promise, you the author of said module, give. You can use the flag -fpackage-trust to enable extra checks while compiling modules marked as Trustworthy described here.
So, if you write normal libraries and don't have a good reason to care about Safe Haskell, then you probably shouldn't care. If your modules are safe, thats fine and can be inferred. If not, then this is probably for a reason, like because you used unsafePerformIO, which is also fine. If you know your module will be used in a way that requires it to compile under -XSafe (e.g. plugins, as above), you should do it. In all other cases, don't bother.

Escaping monad IO

One of the things I like best about Haskell is how the compiler locates the side effects via the IO monad in function signatures. However, it seems easy to bypass this type check by importing 2 GHC primitives :
{-# LANGUAGE MagicHash #-}
import GHC.Magic(runRW#)
import GHC.Types(IO(..))
hiddenPrint :: ()
hiddenPrint = case putStrLn "Hello !" of
IO sideEffect -> case runRW# sideEffect of
_ -> ()
hiddenPrint is of type unit, but it does trigger a side effect when called (it prints Hello). Is there a way to forbid those hidden IOs (other than trusting no one imports GHC's primitives) ?
This is the purpose of Safe Haskell. If you add {-# language Safe #-} to the top of your source file, you will only be allowed to import modules that are either inferred safe or labeled {-# language Trustworthy #-}. This also imposes some mild restrictions on overlapping instances.
There are many ways in which you can break purity in Haskell. However, you have to go out of your way to find them. Here are a few:
Importing GHC internal modules to access low-level primitives
Using the FFI to import a C function as a pure one, when it is not pure
Using unsafePerformIO
Using unsafeCoerce
Using Ptr a and dereference pointers not pointing to valid data
Declaring your own Typeable instances, and lying so that you can abuse cast. (No longer possible in recent GHC)
Using the unsafe array operations
All these, and others, are not meant to be used regularly. They are there so that, if one is really, really sure, can tell the compiler "I know what I am doing, don't get in my way". Doing so, you take the burden of the proof -- proving that what you are doing is safe.

Haskell compiler magic: what requires a special treatment from the compiler?

When trying to learn Haskell, one of the difficulties that arise is the ability when something requires special magic from the compiler. One exemple that comes in mind is the seq function which can't be defined i.e. you can't make a seq2 function behaving exactly as the built-in seq. Consequently, when teaching someone about seq, you need to mention that seq is special because it's a special symbol for the compiler.
Another example would be the do-notation which only works with instances of the Monad class.
Sometimes, it's not always obvious. For instance, continuations. Does the compiler knows about Control.Monad.Cont or is it plain old Haskell that you could have invented yourself? In this case, I think nothing special is required from the compiler even if continuations are a very strange kind of beast.
Language extensions set aside, what other compiler magic Haskell learners should be aware of?
Nearly all the ghc primitives that cannot be implemented in userland are in the ghc-prim package. (it even has a module called GHC.Magic there!)
So browsing it will give a good sense.
Note that you should not use this module in userland code unless you know exactly what you are doing. Most of the usable stuff from it is exported in downstream modules in base, sometimes in modified form. Those downstream locations and APIs are considered more stable, while ghc-prim makes no guarantees as to how it will act from version to version.
The GHC-specific stuff is reexported in GHC.Exts, but plenty of other things go into the Prelude (such as basic data types, as well as seq) or the concurrency libraries, etc.
Polymorphic seq is definitely magic. You can implement seq for any specific type, but only the compiler can implement one function for all possible types [and avoid optimising it away even though it looks no-op].
Obviously the entire IO monad is deeply magic, as is everything to with concurrency and parallelism (par, forkIO, MVar), mutable storage, exception throwing and catching, querying the garbage collector and run-time stats, etc.
The IO monad can be considered a special case of the ST monad, which is also magic. (It allows truly mutable storage, which requires low-level stuff.)
The State monad, on the other hand, is completely ordinary user-level code that anybody can write. So is the Cont monad. So are the various exception / error monads.
Anything to do with syntax (do-blocks, list comprehensions) is hard-wired into the language definition. (Note, though, that some of these respond to LANGUAGE RebindableSyntax, which lets you change what functions it binds to.) Also the deriving stuff; the compiler "knows about" a handful of special classes and how to auto-generate instances for them. Deriving for newtype works for any class though. (It's just copying an instance from one type to another identical copy of that type.)
Arrays are hard-wired. Much like every other programming language.
All of the foreign function interface is clearly hard-wired.
STM can be implemented in user code (I've done it), but it's currently hard-wired. (I imagine this gives a significant performance benefit. I haven't tried actually measuring it.) But, conceptually, that's just an optimisation; you can implement it using the existing lower-level concurrency primitives.

custom Prelude module -- bad idea?

I just realized that I can define my own Prelude module and carefully control its exports.
Is this considered bad practice?
Advantages:
No need to repeatedly import a "Common" module in large projects.
No need to write "import Prelude hiding (catch)".
In general its a bad idea, as you end up with code written in your own idioms that isn't going to be easy to maintain by others.
To communicate with others you need a shared language of symbols. The Prelude is our core language, so if you redefine it, expect confusion.
The exception to this rule would be when developing an embedded domain-specific language. There, making a custom Prelude is entirely a good idea, and is indeed why it is possible to redefine the Prelude (and inbuilt syntax) in the first place.
By all means have your own additional modules, but don't override the Prelude.

Is there a list of GHC extensions that are considered 'safe'?

Occasionally, a piece of code I want to write isn't legal without at least one language extension. This is particularly true when trying to implement ideas in research papers, which tend to use whichever spiffy, super-extended version of GHC was available at the time the paper was written, without making it clear which extensions are actually required.
The result is that I often end up with something like this at the top of my .hs files:
{-# LANGUAGE TypeFamilies
, MultiParamTypeClasses
, FunctionalDependencies
, FlexibleContexts
, FlexibleInstances
, UndecidableInstances
, OverlappingInstances #-}
I don't mind that, but often I feel as though I'm making blind sacrifices to appease the Great God of GHC. It complains that a certain piece of code isn't valid without language extension X, so I add a pragma for X. Then it demands that I enable Y, so I add a pragma for Y. By the time this finishes, I've enable three or four language extensions that I don't really understand, and I have no idea which ones are 'safe'.
To explain what I mean by 'safe':
I understand that UndecidableInstances is safe, because although it may cause the compiler to not terminate, as long as the code compiles it won't have unexpected side effects.
On the other hand, OverlappingInstances is clearly unsafe, because it makes it very easy for me to accidentally write code that gives runtime errors.
Is there a list of GHCextensions which are considered 'safe' and which are 'unsafe'?
It's probably best to look at what SafeHaskell allows:
Safe Language
The Safe Language (enabled through -XSafe) restricts things in two different ways:
Certain GHC LANGUAGE extensions are disallowed completely.
Certain GHC LANGUAGE extensions are restricted in functionality.
Below is precisely what flags and extensions fall into each category:
Disallowed completely: GeneralizedNewtypeDeriving, TemplateHaskell
Restricted functionality: OverlappingInstances, ForeignFunctionInterface, RULES, Data.Typeable
See Restricted Features below
Doesn't Matter: all remaining flags.
Restricted and Disabled GHC Haskell Features
In the Safe language dialect we restrict the following Haskell language features:
ForeignFunctionInterface: This is mostly safe, but foreign import declarations that import a function with a non-IO type are be disallowed. All FFI imports must reside in the IO Monad.
RULES: As they can change the behaviour of trusted code in unanticipated ways, violating semantic consistency they are restricted in function. Specifically any RULES defined in a module M compiled with -XSafe are dropped. RULES defined in trustworthy modules that M imports are still valid and will fire as usual.
OverlappingInstances: This extension can be used to violate semantic consistency, because malicious code could redefine a type instance (by containing a more specific instance definition) in a way that changes the behaviour of code importing the untrusted module. The extension is not disabled for a module M compiled with -XSafe but restricted. While M can define overlapping instance declarations, they can only be used in M. If in a module N that imports M, at a call site that uses a type-class function there is a choice of which instance to use (i.e overlapping) and the most specific choice is from M (or any other Safe compiled module), then compilation will fail. It is irrelevant if module N is considered Safe, or Trustworthy or neither.
Data.Typeable: We allow instances of Data.Typeable to be derived but we don't allow hand crafted instances. Derived instances are machine generated by GHC and should be perfectly safe but hand crafted ones can lie about their type and allow unsafe coercions between types. This is in the spirit of the original design of SYB.
In the Safe language dialect we disable completely the following Haskell language features:
GeneralizedNewtypeDeriving: It can be used to violate constructor access control, by allowing untrusted code to manipulate protected data types in ways the data type author did not intend. I.e can be used to break invariants of data structures.
TemplateHaskell: Is particularly dangerous, as it can cause side effects even at compilation time and can be used to access abstract data types. It is very easy to break module boundaries with TH.
I recall having read that the interaction of FunctionalDependencies and UndecidableInstances can also be unsafe, because beyond allowing an unlimited context stack depth UndecidableInstances also lifts the so-called coverage condition (section 7.6.3.2), but I can't find a cite for this at the moment.
EDIT 2015-10-27: Ever since GHC gained support for type roles, GeneralizedNewtypeDeriving is no longer unsafe. (I'm not sure what else might have changed.)

Resources