I am trying to work with GHC core data types.
I am able to compile my Haskell source to core representation with type Bind CoreBndr.
As we know there is no default Show instance for this data type.
There is a way to pretty print this representation but it has way too much noise associated with it.
I want to treat GHC core as any other algebraic data type and write functions with it.
It would be much easier if we had a Show instance of GHC core.
Has anybody already written a show instance which I can reuse?
Aside, how does the community write and verify programs that deal with GHC core?
A naive implementation of Show in GHC is probably not what you want. The reason for this is because internally GHC has recursion among many of its data types. For instance, between TyCon, AlgTyConRhs, and DataCon we have:
TyCon has AlgTyCon, which contains AlgTyConRhs.
AlgTyConRhs contains data_cons :: [DataCon] as one of its record fields.
DataCon contains dcRepTyCon :: TyCon as one of its fields.
And thus we come full circle. Because of how Show works, recursion like this will create infinite output if you ever attempt to print it.
In order to get a "nice" custom representation with data constructors and everything showing, you would have to write it yourself. This is actually somewhat challenging, since you have to consider and debug cases of recursion like this that default pretty printers have solved.
Related
The following introduction is provided to ensure you understand how I reached the problem (to not fall prey to the XY problem):
I am working on a program which turns a parser in Parsec-like DSL into an actual LL(1) parser (and in the future similarly for LALR(1) or others).
It basically works as follows:
The DSL consists of functions which together build up a GADT. The result is a datatype which might contain cycles ('tying the knot'-style).
Data.Reify is used to turn this into a graph representation ('untying' the knot).
Perform the necessary transformations on this graph to turn it into a LL(1) parsing table.
Construct the parser which will use this table.
Because we want to be able to use the data that is recognized while parsing to construct some kind of result, we need to pass on functions through steps (1.) to (4.).
In steps 1, 2, 3 we can get away with using an existential datatype. It's only when actually running the parser, that I found myself requiring Data.Dynamic and its dynApp to combine the results. We know that the types line up (since in step (1) the GADT construction is type-checked), but I did not figure out how to use the existential types in any other way (as each of the parsing steps might have a very different type).
The current procedure thus 'works' but requires Dynamic. Also, the whole parser, while based on a written function definition, will be constructed at runtime.
Enter Template Haskell: Since the parser function is defined in a different module, it ought to be able to construct the parser at compile-time.
However, there is no Lift instance for Dynamic!
Furthermore, attempting to directly lift the existential types (i.e. require a Lift constraint on them) instead also does not work, as these are almost always functions!
How can we lift a GADT containing either Dynamics or Typeable a => a's into a TemplateHaskell quotation?
Or is there another approach to be able to handle this situation?
I would like to create a frontend for a simple language that would produce GHC Core. I would like to then take this output and run it through the normal GHC pipeline. According to this page, it is not directly possible from the ghc command. I am wondering if there is any way to do it.
I am ideally expecting a few function calls to the ghc-api but I am also open to any suggestions that include (not-so-extensive) hacking in the source of GHC. Any pointers would help!
Note that Core is an explicitly typed language, which can make it quite difficult to generate from other languages (the GHC type checker has inferred all the types so it's no problem there). For example, the usual identity function (id = \x -> x :: forall a. a -> a) becomes
id = \(a :: *) (x :: a) -> a
where a is a type variable of kind *. It is a term-level place-holder for the type-level forall binding. Similarly, when calling id you need to give it a type as its first argument, so the Haskell expression (id 42) gets translated into (id Int 42). Such type bindings and type applications won't be present in the generated machine code, but they are useful to verify compiler transformations are correct.
On the bright side, it might be possible to just generate Haskell -- if you can generate the code in such a way that GHC will always be able to determine its type then you are essentially just using a tiny subset of Haskell. Whether this can work depends very much on your source language, though.
There's still no way to read External Core files, whether via the ghc command or the API. Sorry :(
It's probably theoretically possible to build the Core syntax tree up from your representation using the GHC API, but that sounds very painful. I would recommend targeting some other backend. You don't necessarily have to stop using GHC; straightforward Haskell with unboxed types and unsafeCoerce lets you get pretty close to the resulting Core, so you could define your own simple "Core-ish" language and compile it to that. (Indeed, you could probably even compile GHC Core itself, but that's a bit too meta for my tastes.)
I want to treat UArray as instance of Functor. I want to write a numeric code, and I need to use something more efficient than Array to represent the state (says the profiler). I understand that I could write my code without using functors, but I think functors are a very valuable abstraction that I'd like to have.
As-is this doesn't work, because UArray is only an instance of IArray for certain basic types such as Int or Double. I am contemplating two approaches to make it work nonetheless:
Return an error (either implicitly or explicitly) if the result of fmap is not an instance of IArray
Define a "composite" type that is either based on UArray (if possible) or on Array (if not), akin to a C++ template specialization
I've tried various approaches based on various GHC extensions (existential types, functional dependencies, generalized algebraic data types, multi-parameter type classes, undecidable instances), but I just can't make things work. I always arrive at a point where I need to promise the compiler that "yes, the result will be representable via UArray", but there's just no syntax for it.
I've read various papers, tutorials, and documentation for the GHC extensions above in the hope to find an example that tells me how to do that. The closest I could find is https://wiki.haskell.org/GADTs_for_dummies, which defines a class IsSimple that is very close to what I probably need.
Can you give me a pointer for how to get started?
Data.Vector.Unboxed provides an implementation of what I am looking for. It is already an instance of Functor, and automatically chooses an efficient representation if possible.
In Haskell, you can have infinite lists, because it doesn't completely compute them, it uses thunks. I am wondering if there is a way to serialize or otherwise save to a file a piece of data's thunk. For example let us say you have a list [0..]. Then you do some processing on it (I am mostly interested in tail and (:), but it should support doing filter or map as well.) Here is an example of sort of what I am looking for.
serial::(SerialThunk a)=>a->serThunk
serialized = serial ([0..] :: [Int])
main=writeToFile "foo.txt" serialized
And
deserial::(SerialThunk a)=>serThunk->a
main=do
deserialized <- readFromFile "foo.txt" :: IO [Int]
print $ take 10 deserialized
No. There is no way to serialize a thunk in Haskell. Once code is compiled it is typically represented as assembly (for example, this is what GHC does) and there is no way to recover a serializable description of the function, let alone the function and environment that you'd like to make a thunk.
Yes. You could build custom solutions, such as describing and serializing a Haskell expression. Deserialization and execution could happen by way of interpretation (ex. Using the hint package).
Maybe. Someone (you?) could make a compiler or modify an existing compiler to maintain more information in a platform-agnostic manner such that things could be serialized without the user manually leveraging hint. I imaging this is an are under exploration by the Cloud Haskell (aka distributed-haskell) developers.
Why? I have also wanted an ability to serialize functions so that I could pass closures around in a flexible manner. Most of the time, though, that flexibility isn't actually needed and instead people want to pass certain types of computations that can be easily expressed as a custom data type and interpretation function.
packman: "Evaluation-orthogonal serialisation of Haskell data, as a library" (thanks to a reddit link) -- is exactly what we have been looking for!
...this serialisation is orthogonal to evaluation: the argument is
serialised in its current state of evaluation, it might be
entirely unevaluated (a thunk) or only partially evaluated (containing
thunks).
...The library enables sending and receiving data between different nodes
of a distributed Haskell system. This is where the code originated:
the Eden runtime system.
...Apart from this obvious application, the functionality can be used to
optimise programs by memoisation (across different program runs), and
to checkpoint program execution in selected places. Both uses are
exemplified in the slide set linked above.
...Another limitation is that serialised data can only be used by the
very same binary. This is however common for many approaches to
distributed programming using functional languages.
...
Cloud Haskell supports serialization of function closures. http://www.haskell.org/haskellwiki/Cloud_Haskell
Apart from the work in Cloud Haskell and HdpH on "closures", and part from the answers stating that thunks are not analyzable at runtime, I've found that:
:sprint in GHCi seems to have access to internal thunk representation -- . Perhaps GHCi works with some special, non-optimized code. So in principle one could use this representation and the implementation of :sprint if one wants to serialize thunks, isn't that true?
http://hackage.haskell.org/package/ghc-heap-view-0.5.3/docs/GHC-HeapView.html -- "With this module, you can investigate the heap representation of Haskell values, i.e. to investigate sharing and lazy evaluation."
I'd be very curious to know what kind of working solutions for seriliazing closures can be made out of this stuff...
I would like to create a frontend for a simple language that would produce GHC Core. I would like to then take this output and run it through the normal GHC pipeline. According to this page, it is not directly possible from the ghc command. I am wondering if there is any way to do it.
I am ideally expecting a few function calls to the ghc-api but I am also open to any suggestions that include (not-so-extensive) hacking in the source of GHC. Any pointers would help!
Note that Core is an explicitly typed language, which can make it quite difficult to generate from other languages (the GHC type checker has inferred all the types so it's no problem there). For example, the usual identity function (id = \x -> x :: forall a. a -> a) becomes
id = \(a :: *) (x :: a) -> a
where a is a type variable of kind *. It is a term-level place-holder for the type-level forall binding. Similarly, when calling id you need to give it a type as its first argument, so the Haskell expression (id 42) gets translated into (id Int 42). Such type bindings and type applications won't be present in the generated machine code, but they are useful to verify compiler transformations are correct.
On the bright side, it might be possible to just generate Haskell -- if you can generate the code in such a way that GHC will always be able to determine its type then you are essentially just using a tiny subset of Haskell. Whether this can work depends very much on your source language, though.
There's still no way to read External Core files, whether via the ghc command or the API. Sorry :(
It's probably theoretically possible to build the Core syntax tree up from your representation using the GHC API, but that sounds very painful. I would recommend targeting some other backend. You don't necessarily have to stop using GHC; straightforward Haskell with unboxed types and unsafeCoerce lets you get pretty close to the resulting Core, so you could define your own simple "Core-ish" language and compile it to that. (Indeed, you could probably even compile GHC Core itself, but that's a bit too meta for my tastes.)