I am going through Haskell code to see how I could write similar stream fusion functions and I noticed a funny syntax construct, {-# ... #-}, that I've not come across; so I would like to know what it is and how I can find out how it works:
-- | /O(n)/ Drop elements that do not satisfy the predicate
filter :: Vector v a => (a -> Bool) -> v a -> v a
{-# INLINE filter #-}
filter f = unstream . inplace (MStream.filter f) . stream
More specifically, what does particular line do?
{-# INLINE filter #-}
GHC has a "pragma" system which allows you to specify extra-linguistic information to GHC. In particular, they look like
{-# <NAME> <ARGS...> #-}
The most common you will see are the language extension pragmas which must go at the top of a file and affect the language extensions in effect for the remainder of the file.
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Example where
Normally it ought to be that ignoring a pragma does not affect the meaning of the program. This is broadly true for pragmas like INLINE as they are merely hints to the compiler that this function's body should be inlined wherever it is called in order to open up new optimization opportunities. Haskell semantics gives us guarantees about when such inlining transformations do not change the meaning of the program, thus the choice the compiler makes about whether or not to inline has no effect on the meaning of the program (so long as it doesn't violate the assumptions of those guarantees).
The LANGUAGE pragmas are a bit different in that they specify exactly what language is being written in for the rest of the file. For instance, we typically assume the base language is Haskell98 or Haskell2010 and the LANGUAGE pragmas add extensions such that the language of the file with the headed exemplified earlier is
Haskell2010 + RankNTypes + FlexibleInstances + ScopedTypeVariables
but beyond hinting to the compiler which language is being written these pragma have no further meaning.
The full set of allowable pragma depend upon the compiler being used. GHC's pragmas are listed here (note that this link is for version 7.6.3 while the link in the comments is for 7.0.3). Use of pragmas other than LANGUAGE may be sketchy and platform specific, so learn their use and meaning carefully.
For instance, there's a big debate about whether or not library authors should use INLINE as it tends to suggest a lack of faith in GHC's own inlining heuristics and thus that we should spend more effort tightening those up rather than littering code with manual INLINEs. But that said, INLINE and INLINABLE can have profound impact on tight inner loops if used judiciously.
It's a pragma. It's basically something not expressible in the language standard itself, but still saying something relevant to the compiler.
Some of these pragmas are essentially optional, just e.g. improve performance, hence the comment-like look. In your example, INLINE means the compiler should try hard to not just link to the function in question, but actually "hard-code" it anywhere it's called. This does not in principle change program semantics, but can have quite an impact on performance and memory usage (in particular if combined with extra stream fusion etc. techniques).
Related
Is it possible in Haskell to apply a language pragma to a block of code, rather than the entire file itself?
For example, I enable the -fwarn-monomorphism-restriction flag, but I have a couple of files where I'd really like to disable this flag, so I use {-# LANGUAGE NoMonomorphismRestriction #-} at the top of the file.
However, instead of applying this pragma to the entire module, I'd like to apply it only to the block of code where I don't think this warning is helpful. Only solution I can think of right now is move this block of code to its own file and then import it
In general there is no way to do this, no.
For this particular pragma, you can disable the monomorphism restriction for a single declaration by giving it a type signature. Although I strongly recommend giving a full signature, there may be some situation where that is undesirable for some reason; in such a case even a signature full of holes is sufficient, e.g.
{-# LANGUAGE PartialTypeSignatures #-}
x :: _ => _
x = (+)
will be inferred to have type Num a => a -> a -> a instead of Integer -> Integer -> Integer.
I'm encountering the following hard to understand behavior from GHC-8.2.2
I have some overlapping typeclass instances. No incoherent typeclass instances. There's a certain typeclass instance of the form, roughly,
instance (C f h, C g h) => D1 (D2 f g) h
where C has overlapping instances. When I load my project into stack repl, all is well. The context of this instance is resolved to the instances of C I'm looking for.
If I create a file test.hs which imports a datatype falling under the instance above, all is not well. When I load that into stack repl, and call an operation of D1, then it's clear that the context of the instance of D1 is being resolved to the "wrong" instance of C.
What's especially strange is that if I add test.hs to my project as an exposed module, then reload it into the repl with no other changes, then the context of the instance above is resolved to the "right" instance of C.
I do not see this behavior with GHC-7.10.3, 8.0.1, 8.0.2, or with 8.4.3. Perhaps this is related to this bug?
But I'm not using incoherent instances, which is what that bug seems to involve. I am using a fair number of language extensions in the module where the instance above occurs:
{-#LANGUAGE TypeFamilies, UndecidableInstances, FlexibleInstances, MultiParamTypeClasses, FunctionalDependencies, GADTs, DataKinds, PolyKinds, TypeOperators, PatternSynonyms, RankNTypes, FlexibleContexts, ScopedTypeVariables, DefaultSignatures #-}
I don't yet have a minimal example. A minimal example of the behavior can be found at GHC-Repro. Run test.sh to see the phenomenon. What I would like to know is:
Whether this might conceivably be intended behavior by GHC, and I'm just doing something wrong.
If I am doing something wrong, what I might do to select the "right" instance when importing stuff from my project into other projects.
This issue is now being tracked at: https://ghc.haskell.org/trac/ghc/ticket/15599
I saw a Haskell source code, and at the beginning of the source file, it included several things like:
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE DeriveFoldable #-}
I know that comments in Haskell stat with {- and end with -}, but this is clearly something else. What's the purpose of this? It seems pretty similar like include statements or macros in C.
The purpose is to enable language extensions. It's a compiler pragma. The GHC compiler supports a lot of language extensions. The GHC manual provides explanation of each extension and examples.
Why does this code require the ScopedTypeVariables extension?
{-# LANGUAGE ScopedTypeVariables #-}
char = case Just '3' of
Just (x :: Char) -> x
Nothing -> '?'
When I read the documentation on ScopedTypeVariables, it seems to mean unifying type variables in the function body with the parent function signature. This code snippet isn't unifying any type variables though!
Also what is the effect of loading ScopedTypeVariables without also loading ExplicitForAll? All the other usecases of ScopedTypeVariables seem to require ExplicitForAll to actually work. But in the above snippet, there's no ExplicitForAll.
ScopedTypeVariables enables ExplicitForAll automatically For the sake of your sanity I would suggest always using ScopedTypeVariables when using any other type system extensions (except possibly the ones dealing only with classes/instances/contexts) and never using ExplicitForAll directly.
The reason ScopedTypeVariables is required for pattern variable signatures is just that such signatures are part of the extension. Among other uses, they give you a way to bring a type variable into scope. For example:
f (Just (x::a)) = bob
where
bob::[a]
bob = [x]
I do not know why the pattern signatures are part of ScopedTypeVariables, per se; most likely they were created for this purpose, and all the code was written in one go. Teasing them apart to make orthogonal extensions was almost certainly considered more trouble than it was worth.
Edit
Actually, there's a pretty good reason. With the exception of pattern bindings, a type variable in a pattern signature is generalized, bringing that variable into scope. So in the above example, you don't need to know if there is an a in the outer type environment. If you could enable pattern signatures without scoped type variables, then the variable could be generalized or not depending on whether scoped type variables were turned on or not. The same sort of confusion happens for ExplicitForAll without ScopedTypeVariables, which is why I'd like to kill that extension and make ScopedTypeVariables the default, or at least turn it on automatically with the extensions that currently enable ExplicitForAll.
Occasionally, a piece of code I want to write isn't legal without at least one language extension. This is particularly true when trying to implement ideas in research papers, which tend to use whichever spiffy, super-extended version of GHC was available at the time the paper was written, without making it clear which extensions are actually required.
The result is that I often end up with something like this at the top of my .hs files:
{-# LANGUAGE TypeFamilies
, MultiParamTypeClasses
, FunctionalDependencies
, FlexibleContexts
, FlexibleInstances
, UndecidableInstances
, OverlappingInstances #-}
I don't mind that, but often I feel as though I'm making blind sacrifices to appease the Great God of GHC. It complains that a certain piece of code isn't valid without language extension X, so I add a pragma for X. Then it demands that I enable Y, so I add a pragma for Y. By the time this finishes, I've enable three or four language extensions that I don't really understand, and I have no idea which ones are 'safe'.
To explain what I mean by 'safe':
I understand that UndecidableInstances is safe, because although it may cause the compiler to not terminate, as long as the code compiles it won't have unexpected side effects.
On the other hand, OverlappingInstances is clearly unsafe, because it makes it very easy for me to accidentally write code that gives runtime errors.
Is there a list of GHCextensions which are considered 'safe' and which are 'unsafe'?
It's probably best to look at what SafeHaskell allows:
Safe Language
The Safe Language (enabled through -XSafe) restricts things in two different ways:
Certain GHC LANGUAGE extensions are disallowed completely.
Certain GHC LANGUAGE extensions are restricted in functionality.
Below is precisely what flags and extensions fall into each category:
Disallowed completely: GeneralizedNewtypeDeriving, TemplateHaskell
Restricted functionality: OverlappingInstances, ForeignFunctionInterface, RULES, Data.Typeable
See Restricted Features below
Doesn't Matter: all remaining flags.
Restricted and Disabled GHC Haskell Features
In the Safe language dialect we restrict the following Haskell language features:
ForeignFunctionInterface: This is mostly safe, but foreign import declarations that import a function with a non-IO type are be disallowed. All FFI imports must reside in the IO Monad.
RULES: As they can change the behaviour of trusted code in unanticipated ways, violating semantic consistency they are restricted in function. Specifically any RULES defined in a module M compiled with -XSafe are dropped. RULES defined in trustworthy modules that M imports are still valid and will fire as usual.
OverlappingInstances: This extension can be used to violate semantic consistency, because malicious code could redefine a type instance (by containing a more specific instance definition) in a way that changes the behaviour of code importing the untrusted module. The extension is not disabled for a module M compiled with -XSafe but restricted. While M can define overlapping instance declarations, they can only be used in M. If in a module N that imports M, at a call site that uses a type-class function there is a choice of which instance to use (i.e overlapping) and the most specific choice is from M (or any other Safe compiled module), then compilation will fail. It is irrelevant if module N is considered Safe, or Trustworthy or neither.
Data.Typeable: We allow instances of Data.Typeable to be derived but we don't allow hand crafted instances. Derived instances are machine generated by GHC and should be perfectly safe but hand crafted ones can lie about their type and allow unsafe coercions between types. This is in the spirit of the original design of SYB.
In the Safe language dialect we disable completely the following Haskell language features:
GeneralizedNewtypeDeriving: It can be used to violate constructor access control, by allowing untrusted code to manipulate protected data types in ways the data type author did not intend. I.e can be used to break invariants of data structures.
TemplateHaskell: Is particularly dangerous, as it can cause side effects even at compilation time and can be used to access abstract data types. It is very easy to break module boundaries with TH.
I recall having read that the interaction of FunctionalDependencies and UndecidableInstances can also be unsafe, because beyond allowing an unlimited context stack depth UndecidableInstances also lifts the so-called coverage condition (section 7.6.3.2), but I can't find a cite for this at the moment.
EDIT 2015-10-27: Ever since GHC gained support for type roles, GeneralizedNewtypeDeriving is no longer unsafe. (I'm not sure what else might have changed.)