How to know when an apparently pure Haskell interface hides unsafe operations? - haskell

I have been reading about unsafePerformIO lately, and I would like to ask you something. I'm OK with the fact that a real language should be able to interact with the external environment, so unsafePerformIO is somewhat justified.
However, at the best of my knowledge, I'm not aware of any quick way to know whether an apparently pure (judging from the types) interface/library is really pure without inspecting the code in search for calls to unsafePerformIO (documentation could omit to mention it).
I know it should be used only when you're sure that referential transparency is guaranteed, but I would like to know about it nevertheless.

There's no way without checking the source code. But that isn't too difficult, as Haddock gives a nice link directly to syntax-highlighted definitions right in the documentation. See the "Source" links to the right of the definitions on this page for an example.
Safe Haskell is relevant here; it's used to compile Haskell code in situations where you want to disallow usage of unsafe functionality. If a module uses an unsafe module (such as System.IO.Unsafe) and isn't specifically marked as Trustworthy, it'll inherit its unsafe status. But modules that use unsafePerformIO will generally be using it safely, and thus declare themselves Trustworthy.

In the case that you're thinking, the use of unsafePerformIO is unjustified. The documentation for unsafePerformIO explains this: it's only meant for cases where the implementer can prove that there is no way of breaking referential transparency, i.e., "purely functional" semantics. I.e., if anybody uses unsafePerformIO in a way that a purely functional program can detect it (e.g., write a function whose result depends on more than just its arguments), then that is a disallowed usage.
If you ran into a such a case, the most likely possibility is that you've found a bug.

Related

Should I make my Haskell modules Safe by default?

Is there any downside to marking the modules Safe? Should it be the assumed default?
As you can read in the GHC user manual, if you mark your modules Safe, you are restricted to a certain "safe" subset of the Haskell language.
You can only import modules which are also marked Safe (which rules out functions like unsafeCoerce or unsafePerformIO).
You can't use Template Haskell.
You can't use FFI outside of IO.
You can't use GeneralisedNewtypeDeriving.
You can't manually implement the Generic typeclass for the types you define in your module.
etc.
In exchange, the users of the module gain a bunch of guarantees (quoting from the manual):
Referential transparency — The types can be trusted. Any pure function, is guaranteed to be pure. Evaluating them is deterministic
and won’t cause any side effects. Functions in the IO monad are still
allowed and behave as usual. So, for example, the unsafePerformIO ::
IO a -> a function is disallowed in the safe language to enforce this
property.
Module boundary control — Only symbols that are publicly available through other module export lists can be accessed in the safe
language. Values using data constructors not exported by the defining
module, cannot be examined or created. As such, if a module M
establishes some invariants through careful use of its export list,
then code written in the safe language that imports M is guaranteed to
respect those invariants.
Semantic consistency — For any module that imports a module written in the safe language, expressions that compile both with and without
the safe import have the same meaning in both cases. That is,
importing a module written in the safe language cannot change the
meaning of existing code that isn’t dependent on that module. So, for
example, there are some restrictions placed on the use of
OverlappingInstances, as these can violate this property.
Strict subset — The safe language is strictly a subset of Haskell as implemented by GHC. Any expression that compiles in the safe language
has the same meaning as it does when compiled in normal Haskell.
Note that safety is inferred. If your module is not marked with any of Safe, Trustworthy, or Unsafe, GHC will infer the safety of the module to Safe or Unsafe. When you set the Safe flag, then GHC will issue an error if it decides the module is not actually safe. You can also set -Wunsafe, which emits a warning if a module is inferred to be unsafe. If you let it be inferred, your module will continue to compile even if your dependencies' safety statuses change. If you write it out, you promise to your users that the safety status is stable and dependable.
One use case described in the manual refers to running "untrusted" code. If you provide extension points of any kind in your product and you want to make sure that those capabilities aren't used to attack your product, you can require the code for the extension points to be marked Safe.
You can mark your module Trustworthy and this doesn't restrict you in any way while implementing your module. Your module might be used from Safe code then and it is your responsibility to not violate the guaranties, that should be given by Safe code. So this is a promise, you the author of said module, give. You can use the flag -fpackage-trust to enable extra checks while compiling modules marked as Trustworthy described here.
So, if you write normal libraries and don't have a good reason to care about Safe Haskell, then you probably shouldn't care. If your modules are safe, thats fine and can be inferred. If not, then this is probably for a reason, like because you used unsafePerformIO, which is also fine. If you know your module will be used in a way that requires it to compile under -XSafe (e.g. plugins, as above), you should do it. In all other cases, don't bother.

Haskell compiler magic: what requires a special treatment from the compiler?

When trying to learn Haskell, one of the difficulties that arise is the ability when something requires special magic from the compiler. One exemple that comes in mind is the seq function which can't be defined i.e. you can't make a seq2 function behaving exactly as the built-in seq. Consequently, when teaching someone about seq, you need to mention that seq is special because it's a special symbol for the compiler.
Another example would be the do-notation which only works with instances of the Monad class.
Sometimes, it's not always obvious. For instance, continuations. Does the compiler knows about Control.Monad.Cont or is it plain old Haskell that you could have invented yourself? In this case, I think nothing special is required from the compiler even if continuations are a very strange kind of beast.
Language extensions set aside, what other compiler magic Haskell learners should be aware of?
Nearly all the ghc primitives that cannot be implemented in userland are in the ghc-prim package. (it even has a module called GHC.Magic there!)
So browsing it will give a good sense.
Note that you should not use this module in userland code unless you know exactly what you are doing. Most of the usable stuff from it is exported in downstream modules in base, sometimes in modified form. Those downstream locations and APIs are considered more stable, while ghc-prim makes no guarantees as to how it will act from version to version.
The GHC-specific stuff is reexported in GHC.Exts, but plenty of other things go into the Prelude (such as basic data types, as well as seq) or the concurrency libraries, etc.
Polymorphic seq is definitely magic. You can implement seq for any specific type, but only the compiler can implement one function for all possible types [and avoid optimising it away even though it looks no-op].
Obviously the entire IO monad is deeply magic, as is everything to with concurrency and parallelism (par, forkIO, MVar), mutable storage, exception throwing and catching, querying the garbage collector and run-time stats, etc.
The IO monad can be considered a special case of the ST monad, which is also magic. (It allows truly mutable storage, which requires low-level stuff.)
The State monad, on the other hand, is completely ordinary user-level code that anybody can write. So is the Cont monad. So are the various exception / error monads.
Anything to do with syntax (do-blocks, list comprehensions) is hard-wired into the language definition. (Note, though, that some of these respond to LANGUAGE RebindableSyntax, which lets you change what functions it binds to.) Also the deriving stuff; the compiler "knows about" a handful of special classes and how to auto-generate instances for them. Deriving for newtype works for any class though. (It's just copying an instance from one type to another identical copy of that type.)
Arrays are hard-wired. Much like every other programming language.
All of the foreign function interface is clearly hard-wired.
STM can be implemented in user code (I've done it), but it's currently hard-wired. (I imagine this gives a significant performance benefit. I haven't tried actually measuring it.) But, conceptually, that's just an optimisation; you can implement it using the existing lower-level concurrency primitives.

Modify an argument in a pure function

I am aware that I should not do this
I am asking for dirty hacks to do something nasty
Goal
I wish to modify an argument in a pure function.
Thus achieving effect of pass by reference.
Illustration
For example, I have got the following function.
fn :: a -> [a] -> Bool
fn a as = elem a as
I wish to modify the arguments passed in. Such as the list as.
as' = [1, 2, 3, 4]
pointTo as as'
It is a common misconception that Haskell just deliberately prevents you from doing stuff it deems unsafe, though most other languages allow it. That's the case, for instance, with C++' const modifiers: while they are a valuable guarantee that something which is supposed to stay constant isn't messed with by mistake, it's generally assumed that the overall compiled result doesn't really change when using them; you still get basically impure functions, implemented as some assembler instructions over a stack memory frame.
What's already less known is that even in C++, these const modifiers allow the compiler to apply certain optimisations that can completely break the results when the references are modified (as possible with const_cast – which in Haskell would have at least “unsafe” in its name!); only the mutable keyword gives the guarantee that hidden modifications don't mess up the sematics you'd expect.
In Haskell, the same general issue applies, but much stronger. A Haskell function is not a bunch of assembler instructions operating on a stack frame and memory locations pointed to from the stack. Sometimes the compiler may actually generate something like that, but in general the standard is very careful not to suggest any particular implementation details. So actually, functions may get inlined, stream-fused, rule-replaced etc. to something where it totally wouldn't make sense to even think about “modifying the arguments”, because perhaps the arguments never even exist as such. The thing that Haskell guarantees is that, in a much higher-level sense, the semantics are preserved; the semantics of how you'd think about a function mathematically. In mathematics, it also doesn't make sense at all, conceptually, to talk about “modified arguments”.

Departmental restriction against unsafePerformIO

There has been some talk at work about making it a department-wide policy of prohibiting the use of unsafePerformIO and its ilk. Personally, I don't really mind as I've always maintained that if I found myself wanting to use it, it usually meant that I need to rethink my approach.
Does this restriction sound reasonable? I seem to remember reading somewhere that it was included mainly for FFI, but I can't remember where I read that at the moment.
edit:
Ok, that's my fault. It wouldn't be restricted where it's reasonably needed, ie. FFI. The point of the policy is more to discourage laziness and code smells.
A lot of core libraries like ByteString use unsafePerformIO under the hood, for example to customize memory allocation.
When you use such a library, you're trusting that the library author has proven the referential transparency of their exported API, and that any necessary preconditions for the user are documented. Rather than a blanket ban, your department should establish a policy and a review process for making similar assurances internally.
Well, there are valid uses for unsafePerformIO. It's not there just to be decorative, or as a temptation to test your virtue. None of those uses, however, involve adding meaningful side effects to everyday code. Here's a few examples of uses that can potentially be justified, with various degrees of suspicion:
Wrapping a function that's impure internally, but has no externally observable side effects. This is the same basic idea as the ST monad, except that here the burden is on the programmer to show that the impurity doesn't "leak".
Disguising a function that's deliberately impure in some restricted way. For instance, write-only impurity looks the same as total purity "from the inside", since there's no way to observe the output that's produced. This can be useful for some kinds of logging or debugging, where you explicitly don't want the consistency and well-defined ordering required by the IO monad. An example of this is Debug.Trace.trace, which I sometimes refer to as unsafePerformPrintfDebugging.
Introspection on pure computations, producing a pure result. A classic example is something like the unambiguous choice operator, which can run two equivalent pure functions in parallel in order to get an answer quicker.
Internally unobservable breaking of referential transparency, such as introducing nondeterminism when initializing data. As long as each impure function is evaluated only once, referential transparency will be effectively preserved during any single run of the program, even if the same faux-pure function called with the same arguments gives different results on different runs.
The important thing to note about all of the above is that the resulting impurity is carefully controlled and limited in scope. Given a more fine-grained system of controlling side-effects than the all-purpose IO monad, these would all be obvious candidates for slicing off bits of semi-purity, much like the controlled mutable state in the aforementioned ST monad.
Post scriptum: If a hard-line stance against any non-required use of unsafePerformIO is being considered, I strongly encourage
extending the prohibition to include unsafeInterleaveIO and any functions that allow observation of its behavior. It's at least as sketchy as some of the unsafePerformIO examples I listed above, if you ask me.
unsafePerformIO is the runST of the IO monad. It is sometimes essential. However, unlike runST, the compiler cannot check that you are preserving referential transparency.
So if you use it, the programmer has a burden to explain why the use is safe. It shouldn't be banned, it should be accompanied with evidence.
Outlawing unsafePerformIO in "application" code is an excellent idea. In my opinion there is no excuse for unsafePerformIO in normal code and in my experience it is not needed. It is really not part of the language so you are not really programming in Haskell any more if you use it. How do you know what it even means?
On the other hand, using unsafePerformIO in an FFI binding is reasonable if you know what you are doing.
Outlawing unsafePerformIO is a terrible idea, because it effectively locks code into the IO monad: for example, a c library binding will almost always be in the IO monad - however, using unsafePerformIO a higher-level purely functional library can be built on top of it.
Arguably, unsafePerformIO reflects the compromise between the highly stateful model of the personal computer and the pure, stateless model of haskell; even a function call is a stateful from the computer's point of view since it requires pushing arguments onto a stack, messing with registers, etc., but the usage is based on the knowledge that these operations do in fact compose functionally.

Is Haskell really a purely functional language considering unsafePerformIO?

Haskell is generally referenced as an example of a purely functional language. How can this be justified given the existence of System.IO.Unsafe.unsafePerformIO ?
Edit: I thought with "purely functional" it was meant that it is impossible to introduce impure code into the functional part of the program.
The Languages We Call Haskell
unsafePerformIO is part of the Foreign Function Interface specification, not core Haskell 98 specification. It can be used to do local side effects that don't escape some scope, in order to expose a purely functional interface. That is, we use it to hide effects when the type checker can't do it for us (unlike the ST monad, which hides effects with a static guarantee).
To illustrate precisely the multiple languages that we call "Haskell", consider the image below. Each ring corresponds to a specific set of computational features, ordered by safety, and with area correlating to expressive power (i.e. the number of programs you can write if you have that feature).
The language known as Haskell 98 is specified right down in the middle, admitting total and partial functions. Agda (or Epigram), where only total functions are allowed, is even less expressive, but "more pure" and more safe. While Haskell as we use it today includes everything out to the FFI, where unsafePerformIO lives. That is, you can write anything in modern Haskell, though if you use things from the outer rings, it will be harder to establish safety and security guarantees made simple by the inner rings.
So, Haskell programs are not typically built from 100% referentially transparent code, however, it is the only moderately common language that is pure by default.
I thought with "purely functional" it was meant that it is impossible to introduce impure code...
The real answer is that unsafePerformIO is not part of Haskell, any more than say, the garbage collector or run-time system are part of Haskell. unsafePerformIO is there in the system so that the people who build the system can create a pure functional abstraction over very effectful hardware. All real languages have loopholes that make it possible for system builders to get things done in ways that are more effective than dropping down to C code or assembly code.
As to the broader picture of how side effects and I/O fit into Haskell via the IO monad, I think the easiest way to think of Haskell is that it is a pure language that describes effectful computations. When the computation described is main, the run-time system carries out those effects faithfully.
unsafePerformIO is a way to get effects in an unsafe manner; where "unsafe" means "safety must be guaranteed by the programmer"—nothing is checked by the compiler. If you are a savvy programmer and are willing to meet heavy proof obligations, you can use unsafePerformIO. But at that point you are not programming in Haskell any more; you are programming in an unsafe language that looks a lot like Haskell.
The language/implementation is purely functional. It includes a couple "escape hatches," which you don't have to use if you don't want to.
I don't think unsafePerformIO means that haskell somehow becomes impure. You can create pure (referentially transparent) functions from impure functions.
Consider the skiplist. In order for it to work well it requires access to a RNG, an impure function, but this doesn't make the data structure impure. If you add an item and then convert it to a list, the same list will be returned every time given the item you add.
For this reason I think unsafePerformIO should be thought of as promisePureIO. A function that means that functions that have side-effects and therefore would be labelled impure by the type system can become acknowledged as referentially transparent by the type system.
I understand that you have to have a slightly weaker definition of pure for this to hold though. i.e pure functions are referentially transparent and never called because of a side-effect (like print).
Unfortunately the language has to do some real world work, and this implies talking with the external environment.
The good thing is that you can (and should) limit the usage of this "out of style" code to few specific well documented portions of your program.
I have a feeling I'll be very unpopular for saying what I'm about to say, but felt I had to respond to some of the (in my opinion mis-) information presented here.
Although it's true that unsafePerformIO was officially added to the language as part of the FFI addendum, the reasons for this are largely historical rather than logical. It existed unofficially and was widely used long before Haskell ever had an FFI. It was never officially part of the main Haskell standard because, as you have observed, it was just too much of an embarrassment. I guess the hope was that it would just go away at some point in the future, somehow. Well that hasn't happened, nor will it in my opinion.
The development of FFI addendum provided a convenient pretext for unsafePerformIO to get snuck in to the official language standard as it probably doesn't seem too bad here, when compared to adding the capability to call foreign (I.E. C) code (where all bets are off regarding statically ensuring purity and type safety anyway). It was also jolly convenient to put it here for what are essentially political reasons. It fostered the myth that Haskell would be pure, if only it wasn't for all that dirty "badly designed" C, or "badly designed" operating systems, or "badly designed" hardware or .. whatever.. It's certainly true that unsafePerformIO is regularly used with FFI related code, but the reasons for this are often more to do with bad design of the FFI and indeed of Haskell itself, not bad design of whatever foreign thing Haskell is trying interface too.
So as Norman Ramsey says, the official position came to be that it was OK to use unsafePerformIO provided certain proof obligations were satisfied by whoever used it (primarily that doing this doesn't invalidate important compiler transformations like inlining and common sub-expression elimination). So far so good, or so one might think. The real kicker is that these proof obligations cannot be satisfied by what is probably the single most common use case for unsafePerformIO, which by my estimate accounts for well over 50% of all the unsafePerformIOs out there in the wild. I'm talking about the appalling idiom known as the "unsafePerformIO hack" which is provably (in fact obviously) completely unsafe (in the presence of inlining and cse) .
I don't really have the time, space or inclination to go into what the "unsafePerformIO hack" is or why it's needed in real IO libraries, but the bottom line is that folk who work on Haskells IO infrastructure are usually "stuck between a rock and a hard place". They can either provide an inherently safe API which has no safe implementation (in Haskell), or they can provide an inherently unsafe API which can be safely implemented, but what they can rarely do is provide safety in both API design and implementation. Judging by the depressing regularity with which the "unsafePerformIO hack" appears in real world code (including the Haskell standard libraries), it seems most choose the former option as the lesser of the two evils, and just hope that the compiler won't muck things up with inlining, cse or any other transformation.
I do wish all this was not so. Unfortunately, it is.
Safe Haskell, a recent extension of GHC, gives a new answer to this question. unsafePerformIO is a part of GHC Haskell, but not a part of the safe dialect.
unsafePerformIO should be used only to build referentially transparent functions; for example, memoization. In these cases, the author of a package marks it as "trustworthy". A safe module can import only safe and trustworthy modules; it cannot import unsafe modules.
For more information: GHC manual, Safe Haskell paper
Haskell is generally referenced as an example of a purely functional language. How can this be justified given the existence of System.IO.Unsafe.unsafePerformIO ?
Edit: I thought with "purely functional" it was meant that it is impossible to introduce impure code into the functional part of the program.
The IO monad is actually defined in Haskell, and you can in fact see its definition here. This monad does not exist to deal with impurities but rather to handle side effects. In any case, you could actually pattern match your way out of the IO monad, so the existence of unsafePerformIO shouldn't really be troubling to you.

Resources