Departmental restriction against unsafePerformIO - haskell

There has been some talk at work about making it a department-wide policy of prohibiting the use of unsafePerformIO and its ilk. Personally, I don't really mind as I've always maintained that if I found myself wanting to use it, it usually meant that I need to rethink my approach.
Does this restriction sound reasonable? I seem to remember reading somewhere that it was included mainly for FFI, but I can't remember where I read that at the moment.
edit:
Ok, that's my fault. It wouldn't be restricted where it's reasonably needed, ie. FFI. The point of the policy is more to discourage laziness and code smells.

A lot of core libraries like ByteString use unsafePerformIO under the hood, for example to customize memory allocation.
When you use such a library, you're trusting that the library author has proven the referential transparency of their exported API, and that any necessary preconditions for the user are documented. Rather than a blanket ban, your department should establish a policy and a review process for making similar assurances internally.

Well, there are valid uses for unsafePerformIO. It's not there just to be decorative, or as a temptation to test your virtue. None of those uses, however, involve adding meaningful side effects to everyday code. Here's a few examples of uses that can potentially be justified, with various degrees of suspicion:
Wrapping a function that's impure internally, but has no externally observable side effects. This is the same basic idea as the ST monad, except that here the burden is on the programmer to show that the impurity doesn't "leak".
Disguising a function that's deliberately impure in some restricted way. For instance, write-only impurity looks the same as total purity "from the inside", since there's no way to observe the output that's produced. This can be useful for some kinds of logging or debugging, where you explicitly don't want the consistency and well-defined ordering required by the IO monad. An example of this is Debug.Trace.trace, which I sometimes refer to as unsafePerformPrintfDebugging.
Introspection on pure computations, producing a pure result. A classic example is something like the unambiguous choice operator, which can run two equivalent pure functions in parallel in order to get an answer quicker.
Internally unobservable breaking of referential transparency, such as introducing nondeterminism when initializing data. As long as each impure function is evaluated only once, referential transparency will be effectively preserved during any single run of the program, even if the same faux-pure function called with the same arguments gives different results on different runs.
The important thing to note about all of the above is that the resulting impurity is carefully controlled and limited in scope. Given a more fine-grained system of controlling side-effects than the all-purpose IO monad, these would all be obvious candidates for slicing off bits of semi-purity, much like the controlled mutable state in the aforementioned ST monad.
Post scriptum: If a hard-line stance against any non-required use of unsafePerformIO is being considered, I strongly encourage
extending the prohibition to include unsafeInterleaveIO and any functions that allow observation of its behavior. It's at least as sketchy as some of the unsafePerformIO examples I listed above, if you ask me.

unsafePerformIO is the runST of the IO monad. It is sometimes essential. However, unlike runST, the compiler cannot check that you are preserving referential transparency.
So if you use it, the programmer has a burden to explain why the use is safe. It shouldn't be banned, it should be accompanied with evidence.

Outlawing unsafePerformIO in "application" code is an excellent idea. In my opinion there is no excuse for unsafePerformIO in normal code and in my experience it is not needed. It is really not part of the language so you are not really programming in Haskell any more if you use it. How do you know what it even means?
On the other hand, using unsafePerformIO in an FFI binding is reasonable if you know what you are doing.

Outlawing unsafePerformIO is a terrible idea, because it effectively locks code into the IO monad: for example, a c library binding will almost always be in the IO monad - however, using unsafePerformIO a higher-level purely functional library can be built on top of it.
Arguably, unsafePerformIO reflects the compromise between the highly stateful model of the personal computer and the pure, stateless model of haskell; even a function call is a stateful from the computer's point of view since it requires pushing arguments onto a stack, messing with registers, etc., but the usage is based on the knowledge that these operations do in fact compose functionally.

Related

Why do some of threepenny-gui FRP combinators operate on a MonadIO monad instead of being pure?

First of all a disclaimer, I might have misunderstood completely the way threepenny-gui works due to my not so advanced knowledge of Haskell, so take my assertions with a grain of salt. :-)
It seems to me that some combinators are not pure e.g.
stepper :: MonadIO m => a -> Event a -> m (Behavior a)
Is stepper necessarily operating on some type of IO monad (and it's therefore not pure), or I'm I misunderstanding something here ? Second question, if those combinators are indeed unpure, why is that ? To me this makes the code for building the event graph much less nice then reactive-banana since one has to use an IO monad instead of being just pure plain functions. The code seems to just become more complicated then in reactive-banana.
Even more scary, valueChange appears to be pure from the type
valueChange :: Element -> Event String
but it's actually using unsafeMapIO inside, so is it actually doing hidden IO ? Again, I might be misunderstanding something, but isn't this the most supreme sin in Haskell ? It was also confusing, I couldn't tell from type how the call-back registration was happening... this never happened to me before in Haskell... Was this done to spare users having to deal with the Frameworks monad ?
I'm not a threepenny-gui user, but I can give you a bit of info on your questions from my reading of the code.
Firstly regarding stepper: It seems that internally threpenny-gui uses IORefs to keep track of accumulating parameters. That's why stepper requires MonadIO. However, I do not think this requires IO to build your event graph, it just means it has to run in IO. This shouldn't cause any awkwardness for you as an API user. If you think it does please post a more specific question.
Secondly regarding valueChange being implemented in terms of an unsafe function: Yes, this is very common to see in Haskell libraries. Haskell library authors often use unsafe functions when they know the way that they are used is actually safe for all possible execution paths, but it is not possible to prove this to the type system. In other words, library authors use unsafe operations in a safe way so that you don't have to!
This answer is complementary to Tom Ellis'. It covers how I would justify the monadic FRP combinators from the perspective of a Threepenny user.
With Threepenny, odds are that a large part of your event graph will be built from Elements, which live in UI (a thin wrapper for IO) because they are tied to a DOM in a browser. That being so, it makes sense to remove the indirection involved in externally plugging something like reactive-banana (as the old reactive-banana-threepenny package used to do). In the case of reactive-banana-threepenny the indirection was not limited to having to use the Frameworks monad, but it also involved converting between the different event types of reactive-banana and Threepenny (it is also worth considering that FRP in Threepenny is optional; you can also use events in a traditional, imperative fashion).
Overall, it is a trade-off: to eliminate some indirection and make the API more immediately accessible, some cleanliness in defining the event graphs is sacrificed. In my experience, however, the trade-off is worth it.
On a closing note, this discussion I had with Heinrich around the time of the switch away from reactive-banana-threepenny may shed some more light on the matter.

How to know when an apparently pure Haskell interface hides unsafe operations?

I have been reading about unsafePerformIO lately, and I would like to ask you something. I'm OK with the fact that a real language should be able to interact with the external environment, so unsafePerformIO is somewhat justified.
However, at the best of my knowledge, I'm not aware of any quick way to know whether an apparently pure (judging from the types) interface/library is really pure without inspecting the code in search for calls to unsafePerformIO (documentation could omit to mention it).
I know it should be used only when you're sure that referential transparency is guaranteed, but I would like to know about it nevertheless.
There's no way without checking the source code. But that isn't too difficult, as Haddock gives a nice link directly to syntax-highlighted definitions right in the documentation. See the "Source" links to the right of the definitions on this page for an example.
Safe Haskell is relevant here; it's used to compile Haskell code in situations where you want to disallow usage of unsafe functionality. If a module uses an unsafe module (such as System.IO.Unsafe) and isn't specifically marked as Trustworthy, it'll inherit its unsafe status. But modules that use unsafePerformIO will generally be using it safely, and thus declare themselves Trustworthy.
In the case that you're thinking, the use of unsafePerformIO is unjustified. The documentation for unsafePerformIO explains this: it's only meant for cases where the implementer can prove that there is no way of breaking referential transparency, i.e., "purely functional" semantics. I.e., if anybody uses unsafePerformIO in a way that a purely functional program can detect it (e.g., write a function whose result depends on more than just its arguments), then that is a disallowed usage.
If you ran into a such a case, the most likely possibility is that you've found a bug.

Lock-free programming in Haskell

Does anyone know if it is possible to do lock-free programming in Haskell? I'm interested both in the question of whether the appropriate low-level primitives are available, and (if they are) on any information on what works in terms of using these to build working larger-scale systems in the pure functional context. (I've never done lock-free programming in a pure functional context before.) For instance, as I understand it the Control.Concurrent.Chan channels are built on top of MVars, which (as I understand things) use locks---could one in principle build versions of the Chan primitive which are lock free internally? How much performance gain might one hope to get?
I shoudl also say that I'm familiar with the existence of TVars, but don't understand their internal implementation---I've been given to understand that they are mostly lock free, but I'm not sure if they're entirely lock free. So any information on the internal implementation of TVars would also be helpful!
(This thread provides some discussion, but I wonder if there's anything more up to date/more comprehensive.)
Not only does an MVar use locks, it is a lock abstraction. And, as I recall, individual STM primitives are optimistic, but there are locks used in various places in the STM implementation. Just remember the handy rhyme: "If it can block, then beware of locks".
For real lock-free programming you want to use IORefs directly, along with atomicModifyIORef.
Edit: regarding black holes, as I recall the implementation is lock free, but I can't vouch for the details. The mechanism is described in "Runtime Support for Multicore Haskell": http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/multicore-ghc.pdf
But that implementation underwent some tweaks, I think, as described in Simon Marlow's 2010 Haskell Implementors Workshop talk "Scheduling Lazy Evaluation on Multicore": http://haskell.org/haskellwiki/HaskellImplementorsWorkshop/2010. The slides are unfortunately offline, but the video should still work.
Lock free programming is trivial in Haskell. The easiest way to have a shared piece of data that needs to be modified by many threads is to start with any normal haskell type (list, Map, Maybe, whatever you need), and place it in an IORef. Once you've done this, you have the ability to use atomicModifyIORef to perform modifications in place, which are guaranteed to take next to no time.
type MyDataStructure = [Int]
type ConcMyData = IORef MyDataStructure
main = do
sharedData <- newIORef []
...
atomicModifyIORef sharedData (\xs -> (1:xs,()))
The reason this works is that a pointer to the think that will eventually evaluate the result inside the IORef is stored, and whenever a thread reads from the IORef, they get the thunk, and evaluate as much of the structure as it needs. Since all threads could read this same thunk, it will only be evaluated once (and if it's evaluated more than once, it's guaranteed to always end up with the same result, so concurrent evaluations are ok). I believe this is correct, I'm happy to be corrected though.
The take home message from this is that this sort of abstraction is only easily implemented in a pure language, where the value of things never change (except of course when they do, with types like IORef, MVars, and the STM types). The copy on write nature of Haskell's data structures means that modified structures can share a lot of data with the original structure, while only allocating anything that's new to the structure.
I don't think i've done a very good explaining how this works, but I'll come back tomorrow and clarify my answer.
For more information, see the slides for the talk Multicore programming in Haskell by Simon Marlow of Microsoft Research (and one of the main GHC implementors).
Look into stm, specifically its TChan type.

What makes Iteratees worth the complexity?

First, I understand the how of iteratees, well enough that I could probably write a simplistic and buggy implementation without referring back to any existing ones.
What I'd really like to know is why people seem to find them so fascinating, or under what circumstances their benefits justify their complexity. Comparing them to lazy I/O there is a very clear benefit, but that seems an awful lot like a straw man to me. I never felt comfortable about lazy I/O in the first place, and I avoid it except for the occasional hGetContents or readFile, mostly in very simple programs.
In real-world scenarios I generally use traditional I/O interfaces with control abstractions appropriate to the task. In that context I just don't see the benefit of iteratees, or to what task they are an appropriate control abstraction. Most of the time they seem more like unnecessary complexity or even a counterproductive inversion of control.
I've read a fair number of articles about them and sources that make use of them, but have not yet found a compelling example that actually made me think anything along the lines of "oh, yea, I'd have used them there too." Maybe I just haven't read the right ones. Or perhaps there is a yet-to-be-devised interface, simpler than any I've yet seen, that would make them feel less like a Swiss Army Chainsaw.
Am I just suffering from not-invented-here syndrome or is my unease well-founded? Or is it perhaps something else entirely?
As to why people find them so fascinating, I think because they're such a simple idea. The recent discussion on Haskell-cafe about a denotational semantics for iteratees devolved into a consensus that they're so simple they're barely worth describing. The phrase "little more than a glorified left-fold with a pause button" sticks out to me from that thread. People who like Haskell tend to be fond of simple, elegant structures, so the iteratee idea is likely very appealing.
For me, the chief benefits of iteratees are
Composability. Not only can iteratees be composed, but enumerators can too. This is very powerful.
Safe resource usage. Resources (memory and handles mostly) cannot escape their local scope. Compare to strict I/O, where it's easier to create space leaks by not cleaning up.
Efficient. Iteratees can be highly efficient; competitive with or better than both lazy I/O and strict I/O.
I have found that iteratees provide the greatest benefits when working with single logical data that comes from multiple sources. This is when the composability is most helpful, and resource management with strict I/O most annoying (e.g. nested allocas or brackets).
For an example, in a work-in-progress audio editor, a single logical chunk of sound data is a set of offsets into multiple audio files. I can process that single chunk of sound by doing something like this (from memory, but I think this is right):
enumSound :: MonadIO m => Sound -> Enumerator s m a
enumSound snd = foldr (>=>) enumEof . map enumFile $ sndFiles snd
This seems clear, concise, and elegant to me, much more so than the equivalent strict I/O. Iteratees are also powerful enough to incorporate any processing I want to do, including writing output, so I find this very nice. If I used lazy I/O I could get something as elegant, but the extra care to make sure resources are consumed and GC'd would outweigh the advantages IMO.
I also like that you need to explicitly retain data in iteratees, which avoids the notorious mean xs = sum xs / length xs space leak.
Of course, I don't use iteratees for everything. As an alternative I really like the with* idiom, but when you have multiple resources that need to be nested that gets complex very quickly.
Essentially, it's about doing IO in a functional style, correctly and efficiently. That's all, really.
Correct and efficient are easy enough using quasi-imperative style with strict IO. Functional style is easy with lazy IO, but it's technically cheating (using unsafeInterleaveIO under the hood) and can have issues with resource management and efficiency.
In very, very general terms, a lot of pure functional code follows a pattern of taking some data, recursively expanding it into smaller pieces, transforming the pieces in some fashion, then recombining it into a final result. The structure may be implicit (in the call graph of the program) or an explicit data structure being traversed.
But this falls apart when IO is involved. Say your initial data is a file handle, the "recursively expand" step is reading a line from it, and you can't read the entire file into memory at once. This forces the entire read-transform-recombine process to be done for each line before reading the next one, so instead of the clean "unfold, map, fold" structure they get mashed together into explicitly recursive monadic functions using strict IO.
Iteratees provide an alternative structure to solve the same problem. The "transform and recombine" steps are extracted and, instead of being functions, are changed into a data structure representing the current state of the computation. The "recursively expand" step is given the responsibility of obtaining the data and feeding it to an (otherwise passive) iteratee.
What benefits does this offer? Among other things:
Because an iteratee is a passive object that performs single steps of a computation, they can be easily composed in different ways--for instance, interleaving two iteratees instead of running them sequentially.
The interface between iteratees and enumerators is pure, just a stream of values being processed, so a pure function can be freely spliced in between them.
Data sources and computations are oblivious to each other's internal workings, decoupling input and resource management from processing and output.
The end result is that a program can have a high-level structure much closer to what a pure functional version would look like, with many of the same benefits to compositionality, while simultaneously having efficiency comparable to the more imperative, strict IO version.
As for being "worth the complexity"? Well, that's the thing--they're really not that complex, just a bit new and unfamiliar. The idea's been floating around for only, what, a couple years? Give it some time for things to shake out as people use iteratee-based IO in larger projects (e.g., with things like Snap), and for more examples/tutorials to appear. It's likely that, in hindsight, the current implementations will seem very rough around the edges.
Somewhat related: You may want to read this discussion about functional-style IO. Iteratees aren't mentioned all that much, but the central issue is very similar. In particular this solution, which is both very elegant and goes even further than iteratees in abstracting incremental IO.
under what circumstances their benefits justify their complexity
Every language has strict (classical) IO, where all resources are managed by the user. Haskell also provides ubiquitous lazy IO, where all resource management is delegated to the system.
However, that can create problems, as the scope of resources is dependent on runtime demand properties.
Iteratees strike a third way:
High level abstractions, like lazy IO.
Explicit, lexical scoping of resources, like strict IO.
It is justified when you have complex IO processing tasks, but very tight bounds on resource use. An example is a web server.
Indeed, Snap is built around iteratee IO on top of epoll.

Is Haskell really a purely functional language considering unsafePerformIO?

Haskell is generally referenced as an example of a purely functional language. How can this be justified given the existence of System.IO.Unsafe.unsafePerformIO ?
Edit: I thought with "purely functional" it was meant that it is impossible to introduce impure code into the functional part of the program.
The Languages We Call Haskell
unsafePerformIO is part of the Foreign Function Interface specification, not core Haskell 98 specification. It can be used to do local side effects that don't escape some scope, in order to expose a purely functional interface. That is, we use it to hide effects when the type checker can't do it for us (unlike the ST monad, which hides effects with a static guarantee).
To illustrate precisely the multiple languages that we call "Haskell", consider the image below. Each ring corresponds to a specific set of computational features, ordered by safety, and with area correlating to expressive power (i.e. the number of programs you can write if you have that feature).
The language known as Haskell 98 is specified right down in the middle, admitting total and partial functions. Agda (or Epigram), where only total functions are allowed, is even less expressive, but "more pure" and more safe. While Haskell as we use it today includes everything out to the FFI, where unsafePerformIO lives. That is, you can write anything in modern Haskell, though if you use things from the outer rings, it will be harder to establish safety and security guarantees made simple by the inner rings.
So, Haskell programs are not typically built from 100% referentially transparent code, however, it is the only moderately common language that is pure by default.
I thought with "purely functional" it was meant that it is impossible to introduce impure code...
The real answer is that unsafePerformIO is not part of Haskell, any more than say, the garbage collector or run-time system are part of Haskell. unsafePerformIO is there in the system so that the people who build the system can create a pure functional abstraction over very effectful hardware. All real languages have loopholes that make it possible for system builders to get things done in ways that are more effective than dropping down to C code or assembly code.
As to the broader picture of how side effects and I/O fit into Haskell via the IO monad, I think the easiest way to think of Haskell is that it is a pure language that describes effectful computations. When the computation described is main, the run-time system carries out those effects faithfully.
unsafePerformIO is a way to get effects in an unsafe manner; where "unsafe" means "safety must be guaranteed by the programmer"—nothing is checked by the compiler. If you are a savvy programmer and are willing to meet heavy proof obligations, you can use unsafePerformIO. But at that point you are not programming in Haskell any more; you are programming in an unsafe language that looks a lot like Haskell.
The language/implementation is purely functional. It includes a couple "escape hatches," which you don't have to use if you don't want to.
I don't think unsafePerformIO means that haskell somehow becomes impure. You can create pure (referentially transparent) functions from impure functions.
Consider the skiplist. In order for it to work well it requires access to a RNG, an impure function, but this doesn't make the data structure impure. If you add an item and then convert it to a list, the same list will be returned every time given the item you add.
For this reason I think unsafePerformIO should be thought of as promisePureIO. A function that means that functions that have side-effects and therefore would be labelled impure by the type system can become acknowledged as referentially transparent by the type system.
I understand that you have to have a slightly weaker definition of pure for this to hold though. i.e pure functions are referentially transparent and never called because of a side-effect (like print).
Unfortunately the language has to do some real world work, and this implies talking with the external environment.
The good thing is that you can (and should) limit the usage of this "out of style" code to few specific well documented portions of your program.
I have a feeling I'll be very unpopular for saying what I'm about to say, but felt I had to respond to some of the (in my opinion mis-) information presented here.
Although it's true that unsafePerformIO was officially added to the language as part of the FFI addendum, the reasons for this are largely historical rather than logical. It existed unofficially and was widely used long before Haskell ever had an FFI. It was never officially part of the main Haskell standard because, as you have observed, it was just too much of an embarrassment. I guess the hope was that it would just go away at some point in the future, somehow. Well that hasn't happened, nor will it in my opinion.
The development of FFI addendum provided a convenient pretext for unsafePerformIO to get snuck in to the official language standard as it probably doesn't seem too bad here, when compared to adding the capability to call foreign (I.E. C) code (where all bets are off regarding statically ensuring purity and type safety anyway). It was also jolly convenient to put it here for what are essentially political reasons. It fostered the myth that Haskell would be pure, if only it wasn't for all that dirty "badly designed" C, or "badly designed" operating systems, or "badly designed" hardware or .. whatever.. It's certainly true that unsafePerformIO is regularly used with FFI related code, but the reasons for this are often more to do with bad design of the FFI and indeed of Haskell itself, not bad design of whatever foreign thing Haskell is trying interface too.
So as Norman Ramsey says, the official position came to be that it was OK to use unsafePerformIO provided certain proof obligations were satisfied by whoever used it (primarily that doing this doesn't invalidate important compiler transformations like inlining and common sub-expression elimination). So far so good, or so one might think. The real kicker is that these proof obligations cannot be satisfied by what is probably the single most common use case for unsafePerformIO, which by my estimate accounts for well over 50% of all the unsafePerformIOs out there in the wild. I'm talking about the appalling idiom known as the "unsafePerformIO hack" which is provably (in fact obviously) completely unsafe (in the presence of inlining and cse) .
I don't really have the time, space or inclination to go into what the "unsafePerformIO hack" is or why it's needed in real IO libraries, but the bottom line is that folk who work on Haskells IO infrastructure are usually "stuck between a rock and a hard place". They can either provide an inherently safe API which has no safe implementation (in Haskell), or they can provide an inherently unsafe API which can be safely implemented, but what they can rarely do is provide safety in both API design and implementation. Judging by the depressing regularity with which the "unsafePerformIO hack" appears in real world code (including the Haskell standard libraries), it seems most choose the former option as the lesser of the two evils, and just hope that the compiler won't muck things up with inlining, cse or any other transformation.
I do wish all this was not so. Unfortunately, it is.
Safe Haskell, a recent extension of GHC, gives a new answer to this question. unsafePerformIO is a part of GHC Haskell, but not a part of the safe dialect.
unsafePerformIO should be used only to build referentially transparent functions; for example, memoization. In these cases, the author of a package marks it as "trustworthy". A safe module can import only safe and trustworthy modules; it cannot import unsafe modules.
For more information: GHC manual, Safe Haskell paper
Haskell is generally referenced as an example of a purely functional language. How can this be justified given the existence of System.IO.Unsafe.unsafePerformIO ?
Edit: I thought with "purely functional" it was meant that it is impossible to introduce impure code into the functional part of the program.
The IO monad is actually defined in Haskell, and you can in fact see its definition here. This monad does not exist to deal with impurities but rather to handle side effects. In any case, you could actually pattern match your way out of the IO monad, so the existence of unsafePerformIO shouldn't really be troubling to you.

Resources