Is there any downside to marking the modules Safe? Should it be the assumed default?
As you can read in the GHC user manual, if you mark your modules Safe, you are restricted to a certain "safe" subset of the Haskell language.
You can only import modules which are also marked Safe (which rules out functions like unsafeCoerce or unsafePerformIO).
You can't use Template Haskell.
You can't use FFI outside of IO.
You can't use GeneralisedNewtypeDeriving.
You can't manually implement the Generic typeclass for the types you define in your module.
etc.
In exchange, the users of the module gain a bunch of guarantees (quoting from the manual):
Referential transparency — The types can be trusted. Any pure function, is guaranteed to be pure. Evaluating them is deterministic
and won’t cause any side effects. Functions in the IO monad are still
allowed and behave as usual. So, for example, the unsafePerformIO ::
IO a -> a function is disallowed in the safe language to enforce this
property.
Module boundary control — Only symbols that are publicly available through other module export lists can be accessed in the safe
language. Values using data constructors not exported by the defining
module, cannot be examined or created. As such, if a module M
establishes some invariants through careful use of its export list,
then code written in the safe language that imports M is guaranteed to
respect those invariants.
Semantic consistency — For any module that imports a module written in the safe language, expressions that compile both with and without
the safe import have the same meaning in both cases. That is,
importing a module written in the safe language cannot change the
meaning of existing code that isn’t dependent on that module. So, for
example, there are some restrictions placed on the use of
OverlappingInstances, as these can violate this property.
Strict subset — The safe language is strictly a subset of Haskell as implemented by GHC. Any expression that compiles in the safe language
has the same meaning as it does when compiled in normal Haskell.
Note that safety is inferred. If your module is not marked with any of Safe, Trustworthy, or Unsafe, GHC will infer the safety of the module to Safe or Unsafe. When you set the Safe flag, then GHC will issue an error if it decides the module is not actually safe. You can also set -Wunsafe, which emits a warning if a module is inferred to be unsafe. If you let it be inferred, your module will continue to compile even if your dependencies' safety statuses change. If you write it out, you promise to your users that the safety status is stable and dependable.
One use case described in the manual refers to running "untrusted" code. If you provide extension points of any kind in your product and you want to make sure that those capabilities aren't used to attack your product, you can require the code for the extension points to be marked Safe.
You can mark your module Trustworthy and this doesn't restrict you in any way while implementing your module. Your module might be used from Safe code then and it is your responsibility to not violate the guaranties, that should be given by Safe code. So this is a promise, you the author of said module, give. You can use the flag -fpackage-trust to enable extra checks while compiling modules marked as Trustworthy described here.
So, if you write normal libraries and don't have a good reason to care about Safe Haskell, then you probably shouldn't care. If your modules are safe, thats fine and can be inferred. If not, then this is probably for a reason, like because you used unsafePerformIO, which is also fine. If you know your module will be used in a way that requires it to compile under -XSafe (e.g. plugins, as above), you should do it. In all other cases, don't bother.
Related
In Rust, I am not allowed (for good reasons) to for example implement libA::SomeTrait for libB::SomeType.
Are there similar rules enforced in Haskell, or does Haskell just sort of unify the whole program and make sure it's coherent as a whole?
Edit: To phrase the question more clearly in Haskell terms, can you have an orphan instance where the class and the type are defined in modules from external packages (say, from Hackage)?
It looks like you can have an orphan instance wrt. modules. Can the same be done wrt. packages?
Yes. In Haskell, the compiler doesn't really concern itself with packages at all, except for looking up which modules are available. (Basically, Cabal or Stack tells GHC which ones are available, and GHC just uses them.) And an available module in another package is treated exactly the same as a module in your own package.
Most often, this is not a problem – people are expected to be sensible regarding what instances they define. If they do define a bad instance, it only matters for people who use both libA and libB, and they should know.
If you do have a particular reason why you want to prevent somebody from instantiating a class, you should just not export it. (You can still export functions with that class in their constraint.)
Looking at some code in hackage I stumbled upon the Safe and Trustworthy extensions.
What do they mean broadly (or also exactly...)? Is there a good rule of thumb on when to use them and when not?
Safe Haskell is in essence a subset of the Haskell language. It aims to disable certain "tricks" that are often used, like for example unsafePerformIO :: IO a -> a, and furthermore it (aims to) guarantee that you do not somehow get access to private functions, data constructors, etc. of a module that aims to prevent access from these. In short safe Haskell guarantees three things:
Referential transparency;
Module boundary control; and
Semantic consistency.
A safe module has to follow these limitations, and furthermore only work with safe modules. A safe module of course does not mean that the code it works with is correct, or that an IO action for example can not be malicious. But it has some guarantees since if a function has a type that has no IO, then normally the module should not be able to perform IO in an unsafe way.
Certain extensions like TemplateHaskell and specifying {-# RULES … #-} pragma's are not allowed in safe Haskell. Other extensions, like DeriveDataTypeable are allowed, but only if one makes use of the deriving clause to generate an instance, and thus not generates a custom one.
Some modules however, need to make use of extensions in order to work properly. In that case, the author can mark the module as Trustworthy. That means that the author claims that the module exposes a safe API, but that it internally need to work with some unsafe extensions, pragmas, etc. The compiler thus can not guarantee safety.
These extensions are documented in the documentation:
The Safe Haskell extension introduces the following three language
flags:
XSafe — Enables the safe language dialect, asking GHC to guarantee trust. The safe language dialect requires that all imports
be trusted or a compilation error will occur.
XTrustworthy — Means that while this module may invoke unsafe functions internally, the module's author claims that it exports an
API that can't be used in an unsafe way. This doesn't enable the
safe language or place any restrictions on the allowed Haskell code.
The trust guarantee is provided by the module author, not GHC. An
import statement with the safe keyword results in a compilation error
if the imported module is not trusted. An import statement without the
keyword behaves as usual and can import any module whether trusted or
not.
XUnsafe — Marks the module being compiled as unsafe so that modules compiled using -XSafe can't import it.
I have a library that includes a type data Zq q = Zq Int representing the integers mod q. For safety, I'd like to expose some operations on this type ((+), (*), etc), but not export the constructor to avoid people circumventing the safety gotten by declaring such a type in the first place.
However, users of the library may reasonably need to declare instances for this type that I as the library author can't predict. To name just a few possible instances: DeepSeq, Storable, Unbox, ...
The only way I know of that allows third parties to make such instances is to export the constructor. (Alternatively, I could define and export a smart constructor and destructor, but this seems to be no better than just exporting the data constructor.)
Is there a way to ensure safety while also allowing third parties to extend the type?
Most well-formed instances shouldn't require the unsafe raw constructors. Unbox etc. are a bit unusually low-level, but other instances should generally be definable in terms of much the same high-level API you'd also use for end applications.
So, I don't really see your concern of don't know instances ⇒ can't hide constructors. If you just define the critical close-to-the-metal instances yourself you should be fine.
That said, I often find it rather annoying if a library doesn't export the constructors at all. Even if every instance and everything else can be defined only using the high-level API, it can make sense to grant unsafe low-level access for a lot of reasons that can't really be forseen. Debugging, special optimisations, simply seeing what's going on...Hence, in a similar vein to Python's “we're all consenting adults here” philosophy, I'd support kosmikus' suggestion: export the constructors of all important types, but do it in a way that makes it clear that using these directly is unsafe. An extra Unsafe module is a good way to achieve this. Simply giving the constructor a technical-sounding name may also be sufficient. And of course document what precisely is unsafe about these exports.
When trying to learn Haskell, one of the difficulties that arise is the ability when something requires special magic from the compiler. One exemple that comes in mind is the seq function which can't be defined i.e. you can't make a seq2 function behaving exactly as the built-in seq. Consequently, when teaching someone about seq, you need to mention that seq is special because it's a special symbol for the compiler.
Another example would be the do-notation which only works with instances of the Monad class.
Sometimes, it's not always obvious. For instance, continuations. Does the compiler knows about Control.Monad.Cont or is it plain old Haskell that you could have invented yourself? In this case, I think nothing special is required from the compiler even if continuations are a very strange kind of beast.
Language extensions set aside, what other compiler magic Haskell learners should be aware of?
Nearly all the ghc primitives that cannot be implemented in userland are in the ghc-prim package. (it even has a module called GHC.Magic there!)
So browsing it will give a good sense.
Note that you should not use this module in userland code unless you know exactly what you are doing. Most of the usable stuff from it is exported in downstream modules in base, sometimes in modified form. Those downstream locations and APIs are considered more stable, while ghc-prim makes no guarantees as to how it will act from version to version.
The GHC-specific stuff is reexported in GHC.Exts, but plenty of other things go into the Prelude (such as basic data types, as well as seq) or the concurrency libraries, etc.
Polymorphic seq is definitely magic. You can implement seq for any specific type, but only the compiler can implement one function for all possible types [and avoid optimising it away even though it looks no-op].
Obviously the entire IO monad is deeply magic, as is everything to with concurrency and parallelism (par, forkIO, MVar), mutable storage, exception throwing and catching, querying the garbage collector and run-time stats, etc.
The IO monad can be considered a special case of the ST monad, which is also magic. (It allows truly mutable storage, which requires low-level stuff.)
The State monad, on the other hand, is completely ordinary user-level code that anybody can write. So is the Cont monad. So are the various exception / error monads.
Anything to do with syntax (do-blocks, list comprehensions) is hard-wired into the language definition. (Note, though, that some of these respond to LANGUAGE RebindableSyntax, which lets you change what functions it binds to.) Also the deriving stuff; the compiler "knows about" a handful of special classes and how to auto-generate instances for them. Deriving for newtype works for any class though. (It's just copying an instance from one type to another identical copy of that type.)
Arrays are hard-wired. Much like every other programming language.
All of the foreign function interface is clearly hard-wired.
STM can be implemented in user code (I've done it), but it's currently hard-wired. (I imagine this gives a significant performance benefit. I haven't tried actually measuring it.) But, conceptually, that's just an optimisation; you can implement it using the existing lower-level concurrency primitives.
I have been reading about unsafePerformIO lately, and I would like to ask you something. I'm OK with the fact that a real language should be able to interact with the external environment, so unsafePerformIO is somewhat justified.
However, at the best of my knowledge, I'm not aware of any quick way to know whether an apparently pure (judging from the types) interface/library is really pure without inspecting the code in search for calls to unsafePerformIO (documentation could omit to mention it).
I know it should be used only when you're sure that referential transparency is guaranteed, but I would like to know about it nevertheless.
There's no way without checking the source code. But that isn't too difficult, as Haddock gives a nice link directly to syntax-highlighted definitions right in the documentation. See the "Source" links to the right of the definitions on this page for an example.
Safe Haskell is relevant here; it's used to compile Haskell code in situations where you want to disallow usage of unsafe functionality. If a module uses an unsafe module (such as System.IO.Unsafe) and isn't specifically marked as Trustworthy, it'll inherit its unsafe status. But modules that use unsafePerformIO will generally be using it safely, and thus declare themselves Trustworthy.
In the case that you're thinking, the use of unsafePerformIO is unjustified. The documentation for unsafePerformIO explains this: it's only meant for cases where the implementer can prove that there is no way of breaking referential transparency, i.e., "purely functional" semantics. I.e., if anybody uses unsafePerformIO in a way that a purely functional program can detect it (e.g., write a function whose result depends on more than just its arguments), then that is a disallowed usage.
If you ran into a such a case, the most likely possibility is that you've found a bug.