I've been playing with Haskell's -XDataKinds feature quite a lot recently, and have found myself wanting to create a kind.
I'm not sure if my wishes can come true, but from Edward Kmett's constraints package, there seems to be a declared kind Constraint (with sort BOX), which says to be defined in GHC.Prim, but I couldn't find it.
Is there any way to declare a kind in Haskell or GHC, manually? This would probably need manual assertion that data types declared with data would be of the proper kind. My idea is something like the following:
data Foo :: BOX
data Bar a :: Foo where
Bar :: a -> Bar a
In current GHC (7.8 at time of writing), one cannot separate the declaration of a fresh kind from the declaration of its type-level inhabitants.
Related
I understand that newtype erases the type constructor at compile time as an optimization, so that newtype Foo = Foo Int results in just an Int. In other words, I am not asking this question. My question is not about what newtype does.
Instead, I'm trying to understand why the compiler can't simply apply this optimization itself when it sees a single-value data constructor. When I use hlint, it's smart enough to tell me that a single-value data constructor should be a newtype. (I never make this mistake, but tried it out to see what would happen. My suspicions were confirmed.)
One objection could be that without newtype, we couldn't use GeneralizedNewTypeDeriving and other such extensions. But that's easily solved. If we say…
data Foo m a b = Foo a (m b) deriving (Functor, Applicative, Monad)
The compiler can just barf and tell us of our folly.
Why do we need newtype when the compiler can always figure it out for itself?
It seems plausible that newtype started out mostly as a programmer-supplied annotation to perform an optimization that compilers were too stupid to figure out on their own, sort of like the register keyword in C.
However, in Haskell, newtype isn't just an advisory annotation for the compiler; it actually has semantic consequences. The types:
newtype Foo = Foo Int
data Bar = Bar Int
declare two non-isomorphic types. Specifically, Foo undefined and undefined :: Foo are equivalent while Bar undefined and undefined :: Bar are not, with the result that:
Foo undefined `seq` "not okay" -- is an exception
Bar undefined `seq` "okay" -- is "okay"
and
case undefined of Foo n -> "okay" -- is okay
case undefined of Bar n -> "not okay" -- is an exception
As others have noted, if you make the data field strict:
data Baz = Baz !Int
and take care to only use irrefutable pattern matches, then Baz acts just like the newtype Foo:
Baz undefined `seq` "not okay" -- exception, like Foo
case undefined of ~(Baz n) -> "okay" -- is "okay", like Foo
In other words, if my grandmother had wheels, she'd be a bike!
So, why can't the compiler simply apply this optimization itself when it sees a single-value data constructor? Well, it can't perform this optimization in general without changing the semantics of a program, so it needs to first prove that the semantics are unchanged if a particular arbitrary, one-constructor, one-field data type is made strict in its field and matched irrefutably instead of strictly. Since this depends on how values of the type are actually used, this can be hard to do for data types exported by a module, especially at function call boundaries, but the existing optimization mechanisms for specialization, inlining, strictness analysis, and unboxing often perform equivalent optimizations in chunks of self-contained code, so you may get the benefits of a newtype even when you use a data type by accident. In general, though, it seems to be too hard a problem for the compiler to solve, so the burden of remembering to newtype things is left on the programmer.
This leads to the obvious question -- why can't we change the semantics so they're equivalent; why are the semantics of newtype and data different in the first place?
Well, the reason for the newtype semantics seems pretty obvious. As a result of the nature of the newtype optimization (erasure of the type and constructor at compile time), it becomes impossible -- or at the very least exceedingly difficulty -- to separately represent Foo undefined and undefined :: Foo at compile time which explains the equivalence of these two values. Consequently, irrefutable matching is an obvious further optimization when there's only one possible constructor and there's no possibility that that constructor isn't present (or at least no possibility of distinguishing between presence and absence of the constructor, because the only case where this could happen is in distinguishing between Foo undefined and undefined :: Foo, which we've already said can't be distinguished in compiled code).
The reason for the semantics of a one-constructor, one-field data type (in the absence of strictness annotations and irrefutable matches) is maybe less obvious. However, these semantics are entirely consistent with data types having constructor and/or field counts other than one, while the newtype semantics would introduce an arbitrary inconsistency between this one special case of a data type and all others.
Because of this historical distinction between data and newtype types, a number of subsequent extensions have treated them differently, further entrenching different semantics. You mention GeneralizedNewTypeDeriving which works on newtypes but not one-constructor, one-field data types. There are further differences in calculation of representational equivalence used for safe coercions (i.e., Data.Coerce) and DerivingVia, the use of existential quantification or more general GADTs, the UNPACK pragma, etc. There are also some differences in the way types are represented in generics, though now that I look at them more carefully, they seem pretty superficial.
Even if newtypes were an unnecessary historical mistake that could have been replaced by special-casing certain data types, it's a little late to put the genie back in the bottle.
Besides, newtypes don't really strike me as unnecessary duplication of an existing facility. To me, data and newtype types are conceptually quite different. A data type is an algebraic, sum-of-products type, and it's just coincidence that a particular special case of algebraic types happens to have one constructor and one field and so ends up being (nearly) isomorphic to the field type. In contrast, a newtype is intended from the start to be an isomorphism of an existing type, basically a type alias with an extra wrapper to distinguish it at the type level and allow us to pass around a separate type constructor, attach instances, and so on.
This is an excellent question. Semantically,
newtype Foo = Foo Int
is identical to
data Foo' = Foo !Int
except that pattern matching on the former is lazy and on the latter is strict. So a compiler certainly could compile them the same, and adjust the compilation of pattern matching to keep the semantics right.
For a type like you've described, that optimization isn't really all that critical in practice, because users can just use newtype and sprinkle in seqs or bang patterns as needed. Where it would get a lot more useful is for existentially quantified types and GADTs. That is, we'd like to get the more compact representation for types like
data Baz a b where
Baz :: !a -> Baz a Bool
data Quux where
Quux :: !a -> Quux
But GHC doesn't currently offer any such optimization, and doing so would be somewhat trickier in these contexts.
Why do we need newtype when the compiler can always figure it out for itself?
It can’t. data and newtype have different semantics: data adds an additional level of indirection, while newtype has exactly the same representation as its wrapped type, and always uses lazy pattern matching, while you choose whether to make data lazy or strict with strictness annotation (! or pragmas like StrictData).
Likewise, a compiler doesn’t always know for certain when data can be replaced with newtype. Strictness analysis allows it to conservatively determine when it may remove unnecessary laziness around things that will always be evaluated; in this case it can effectively remove the data wrapper locally. GHC does something similar when removing extra boxing & unboxing in a chain of operations on a boxed numeric type like Int, so it can do most of the calculations on the more efficient unboxed Int#. But in general (that is, without global optimisation) it can’t know whether some code is relying on that thunk’s being there.
So HLint offers this as a suggestion because usually you don’t need the “extra” wrapper at runtime, but other times it’s essential. The advice is just that: advice.
I would like to create a frontend for a simple language that would produce GHC Core. I would like to then take this output and run it through the normal GHC pipeline. According to this page, it is not directly possible from the ghc command. I am wondering if there is any way to do it.
I am ideally expecting a few function calls to the ghc-api but I am also open to any suggestions that include (not-so-extensive) hacking in the source of GHC. Any pointers would help!
Note that Core is an explicitly typed language, which can make it quite difficult to generate from other languages (the GHC type checker has inferred all the types so it's no problem there). For example, the usual identity function (id = \x -> x :: forall a. a -> a) becomes
id = \(a :: *) (x :: a) -> a
where a is a type variable of kind *. It is a term-level place-holder for the type-level forall binding. Similarly, when calling id you need to give it a type as its first argument, so the Haskell expression (id 42) gets translated into (id Int 42). Such type bindings and type applications won't be present in the generated machine code, but they are useful to verify compiler transformations are correct.
On the bright side, it might be possible to just generate Haskell -- if you can generate the code in such a way that GHC will always be able to determine its type then you are essentially just using a tiny subset of Haskell. Whether this can work depends very much on your source language, though.
There's still no way to read External Core files, whether via the ghc command or the API. Sorry :(
It's probably theoretically possible to build the Core syntax tree up from your representation using the GHC API, but that sounds very painful. I would recommend targeting some other backend. You don't necessarily have to stop using GHC; straightforward Haskell with unboxed types and unsafeCoerce lets you get pretty close to the resulting Core, so you could define your own simple "Core-ish" language and compile it to that. (Indeed, you could probably even compile GHC Core itself, but that's a bit too meta for my tastes.)
I am trying to work with GHC core data types.
I am able to compile my Haskell source to core representation with type Bind CoreBndr.
As we know there is no default Show instance for this data type.
There is a way to pretty print this representation but it has way too much noise associated with it.
I want to treat GHC core as any other algebraic data type and write functions with it.
It would be much easier if we had a Show instance of GHC core.
Has anybody already written a show instance which I can reuse?
Aside, how does the community write and verify programs that deal with GHC core?
A naive implementation of Show in GHC is probably not what you want. The reason for this is because internally GHC has recursion among many of its data types. For instance, between TyCon, AlgTyConRhs, and DataCon we have:
TyCon has AlgTyCon, which contains AlgTyConRhs.
AlgTyConRhs contains data_cons :: [DataCon] as one of its record fields.
DataCon contains dcRepTyCon :: TyCon as one of its fields.
And thus we come full circle. Because of how Show works, recursion like this will create infinite output if you ever attempt to print it.
In order to get a "nice" custom representation with data constructors and everything showing, you would have to write it yourself. This is actually somewhat challenging, since you have to consider and debug cases of recursion like this that default pretty printers have solved.
I'm not familiar with GHC internals but I have a couple questions about ConstraintKinds.
It says from GHC.Exts that
data Constraint :: BOX
which is misleading because Constraint is a kind of sort BOX. This brings us to the first question: we can import and export kinds? How does that work?
Please correct me on this next part if I'm totally off. From trying out different imports and glancing around at the source on hackage, my guess is that GHC.Exts imports Constraint from GHC.Base, who in turn, imports it from GHC.Prim. But I do not see where it is defined in GHC.Prim?
To my knowledge, there is no definition of Constraint in any Haskell source file. It's a built-in, wired-in name that is defined to belong within GHC.Prim in the GHC sources itself. So in particular Constraint is not a promoted datatype, there's no corresponding datatype of kind * that is called Constraint.
There are other kinds in GHC that are treated similarly, such as AnyK, OpenKind or even BOX itself.
GHC doesn't really make a big difference internally between datatypes and kinds and anything above. That's why they e.g. all show up as being defined using data albeit with different target kinds.
Note that as far as GHC is concerned, we also have
data BOX :: BOX
It's impossible for a user to directly define new "kinds" of super-kind BOX, though.
As far as I know, importing / exporting also makes no difference between the type and kind namespaces. So e.g.
import GHC.Exts (OpenKind, BOX, Constraint)
is legal. In fact, if you then say
x :: Constraint
x = undefined
you don't get a scope error, but a kind error, saying that a type of kind * is expected, but a type/kind of kind BOX is provided.
I should perhaps also say that the whole story about kinds is somewhat in flux, and there are proposals being discussed that change this a bit: see e.g. https://ghc.haskell.org/trac/ghc/wiki/NoSubKinds for related discussion.
I would like to create a frontend for a simple language that would produce GHC Core. I would like to then take this output and run it through the normal GHC pipeline. According to this page, it is not directly possible from the ghc command. I am wondering if there is any way to do it.
I am ideally expecting a few function calls to the ghc-api but I am also open to any suggestions that include (not-so-extensive) hacking in the source of GHC. Any pointers would help!
Note that Core is an explicitly typed language, which can make it quite difficult to generate from other languages (the GHC type checker has inferred all the types so it's no problem there). For example, the usual identity function (id = \x -> x :: forall a. a -> a) becomes
id = \(a :: *) (x :: a) -> a
where a is a type variable of kind *. It is a term-level place-holder for the type-level forall binding. Similarly, when calling id you need to give it a type as its first argument, so the Haskell expression (id 42) gets translated into (id Int 42). Such type bindings and type applications won't be present in the generated machine code, but they are useful to verify compiler transformations are correct.
On the bright side, it might be possible to just generate Haskell -- if you can generate the code in such a way that GHC will always be able to determine its type then you are essentially just using a tiny subset of Haskell. Whether this can work depends very much on your source language, though.
There's still no way to read External Core files, whether via the ghc command or the API. Sorry :(
It's probably theoretically possible to build the Core syntax tree up from your representation using the GHC API, but that sounds very painful. I would recommend targeting some other backend. You don't necessarily have to stop using GHC; straightforward Haskell with unboxed types and unsafeCoerce lets you get pretty close to the resulting Core, so you could define your own simple "Core-ish" language and compile it to that. (Indeed, you could probably even compile GHC Core itself, but that's a bit too meta for my tastes.)