Using GHC to compile Core [duplicate]

Using GHC to compile Core [duplicate] - haskell

I would like to create a frontend for a simple language that would produce GHC Core. I would like to then take this output and run it through the normal GHC pipeline. According to this page, it is not directly possible from the ghc command. I am wondering if there is any way to do it.
I am ideally expecting a few function calls to the ghc-api but I am also open to any suggestions that include (not-so-extensive) hacking in the source of GHC. Any pointers would help!

Note that Core is an explicitly typed language, which can make it quite difficult to generate from other languages (the GHC type checker has inferred all the types so it's no problem there). For example, the usual identity function (id = \x -> x :: forall a. a -> a) becomes
id = \(a :: *) (x :: a) -> a
where a is a type variable of kind *. It is a term-level place-holder for the type-level forall binding. Similarly, when calling id you need to give it a type as its first argument, so the Haskell expression (id 42) gets translated into (id Int 42). Such type bindings and type applications won't be present in the generated machine code, but they are useful to verify compiler transformations are correct.
On the bright side, it might be possible to just generate Haskell -- if you can generate the code in such a way that GHC will always be able to determine its type then you are essentially just using a tiny subset of Haskell. Whether this can work depends very much on your source language, though.

There's still no way to read External Core files, whether via the ghc command or the API. Sorry :(
It's probably theoretically possible to build the Core syntax tree up from your representation using the GHC API, but that sounds very painful. I would recommend targeting some other backend. You don't necessarily have to stop using GHC; straightforward Haskell with unboxed types and unsafeCoerce lets you get pretty close to the resulting Core, so you could define your own simple "Core-ish" language and compile it to that. (Indeed, you could probably even compile GHC Core itself, but that's a bit too meta for my tastes.)

Related

`Show` instance for GHC core

I am trying to work with GHC core data types.
I am able to compile my Haskell source to core representation with type Bind CoreBndr.
As we know there is no default Show instance for this data type.
There is a way to pretty print this representation but it has way too much noise associated with it.
I want to treat GHC core as any other algebraic data type and write functions with it.
It would be much easier if we had a Show instance of GHC core.
Has anybody already written a show instance which I can reuse?
Aside, how does the community write and verify programs that deal with GHC core?

A naive implementation of Show in GHC is probably not what you want. The reason for this is because internally GHC has recursion among many of its data types. For instance, between TyCon, AlgTyConRhs, and DataCon we have:
TyCon has AlgTyCon, which contains AlgTyConRhs.
AlgTyConRhs contains data_cons :: [DataCon] as one of its record fields.
DataCon contains dcRepTyCon :: TyCon as one of its fields.
And thus we come full circle. Because of how Show works, recursion like this will create infinite output if you ever attempt to print it.
In order to get a "nice" custom representation with data constructors and everything showing, you would have to write it yourself. This is actually somewhat challenging, since you have to consider and debug cases of recursion like this that default pretty printers have solved.

Why has Haskell troubles resolving "overloaded" operators?

This post poses the question for the case of !! . The accepted answer tell us that what you are actually doing is creating a new function !! and then you should avoid importing the standard one.
But, why to do so if the new function is to be applied to different types than the standard one? Is not the compiler able to choose the right one according to its parameters?
Is there any compiler flag to allow this?
For instance, if * is not defined for [Float] * Float
Why the compiler cries
> Ambiguous occurrence *
> It could refer to either `Main.*', defined at Vec.hs:4:1
> or `Prelude.*',
for this code:
(*) :: [Float] -> Float -> [Float]
(*) as k = map (\a -> a*k) as -- here: clearly Float*Float
r = [1.0, 2.0, 3.0] :: [Float]
s = r * 2.0 -- here: clearly [Float] * Float
main = do
print r
print s

Allowing the compiler to choose the correct implementation of a function based on its type is the purpose of typeclasses. It is not possible without them.
For a justification of this approach, you might read the paper that introduced them: How to make ad-hoc polymorphism less ad hoc [PDF].

Really, the reason is this: in Haskell, there is not necessarily a clear association “variable x has type T”.
Haskell is almost as flexible as dynamic languages, in the sense that any type can be a type variable, i.e. can have polymorphic type. But whereas in dynamic languages (and also e.g. OO polymorphism or C++ templates), the types of such type-variables are basically just extra information attached to the value-variables in your code (so an overloaded operator can see: argument is an Int->do this, is a String->do that), in Haskell the type variables live in a completely seperate scope in the type language. This gives you many advantages, for instance higher-kinded polymorphism is pretty much impossible without such a system. However, it also means it's harder to reason about how overloaded functions should be resolved. If Haskell allowed you to just write overloads and assume the compiler does its best guess at resolving the ambiguity, you'd often end up with strange error messages in unexpected places. (Actually, this can easily happen with overloads even if you have no Hindley-Milner type system. C++ is notorious for it.)
Instead, Haskell chooses to force overloads to be explicit. You must first define a type class before you can overload methods, and though this can't completely preclude confusing compilation errors it makes them much easier to avoid. Also, it lets you express polymorphic methods with type resolution that couldn't be expressed with traditional overloading, in particular polymorphic results (which is great for writing very easily reusable code).

It is a design decision, not a theoretical problem, not to include this in Haskell. As you say, many other languages use types to disambiguate between terms on an ad-hoc way. But type classes have similar functionality and additionally allow abstraction over things that are overloaded. Type-directed name resolution does not.
Nevertheless, forms of type-directed name resolution have been discussed for Haskell (for example in the context of resolving record field selectors) and are supported by some languages similar to Haskell such as Agda (for data constructors) or Idris (more generally).

Compiling to GHC Core

I would like to create a frontend for a simple language that would produce GHC Core. I would like to then take this output and run it through the normal GHC pipeline. According to this page, it is not directly possible from the ghc command. I am wondering if there is any way to do it.
I am ideally expecting a few function calls to the ghc-api but I am also open to any suggestions that include (not-so-extensive) hacking in the source of GHC. Any pointers would help!

Note that Core is an explicitly typed language, which can make it quite difficult to generate from other languages (the GHC type checker has inferred all the types so it's no problem there). For example, the usual identity function (id = \x -> x :: forall a. a -> a) becomes
id = \(a :: *) (x :: a) -> a
where a is a type variable of kind *. It is a term-level place-holder for the type-level forall binding. Similarly, when calling id you need to give it a type as its first argument, so the Haskell expression (id 42) gets translated into (id Int 42). Such type bindings and type applications won't be present in the generated machine code, but they are useful to verify compiler transformations are correct.
On the bright side, it might be possible to just generate Haskell -- if you can generate the code in such a way that GHC will always be able to determine its type then you are essentially just using a tiny subset of Haskell. Whether this can work depends very much on your source language, though.

There's still no way to read External Core files, whether via the ghc command or the API. Sorry :(
It's probably theoretically possible to build the Core syntax tree up from your representation using the GHC API, but that sounds very painful. I would recommend targeting some other backend. You don't necessarily have to stop using GHC; straightforward Haskell with unboxed types and unsafeCoerce lets you get pretty close to the resulting Core, so you could define your own simple "Core-ish" language and compile it to that. (Indeed, you could probably even compile GHC Core itself, but that's a bit too meta for my tastes.)

In Haskell, is there some way to forcefully coerce a polymorphic call?

I have a list of values (or functions) of any type. I have another list of functions of any type. The user at runtime will choose one from the first list, and another from the second list. I have a mechanism to ensure that the two items are type compatible (value or output from first is compatible with input of second).
I need some way to call the function with the value (or compose the functions). If the second function has concrete types, unsafeCoerce works fine. But if it's of the form:
polyFunc :: MyTypeclass a => a -> IO ()
polyFunc x = print . show . typeclassFunc x
Then unsafeCoerce doesn't work since it can't resolve to a concrete type.
Is there any way to do what I'm trying to do?
Here's an example of what the lists might look like. However... I'm not limited to this, if there is some other way to represent these that will solve the problem, I would like to know. A critical thing to consider is that: the list can change at runtime so I do not know at compile time all the possible types that might be involved.
data Wrapper = forall a. Wrapper a
firstList :: [Wrapper]
firstList = [Wrapper "blue", Wrapper 5, Wrapper valueOfMyTypeclass]
data OtherWrapper = forall a. Wrapper (a -> IO ())
secondList :: [OtherWrapper]
secondList = [OtherWrapper print, OtherWrapper polyFunc]
Note: As for why I want to do such a crazy thing:
I'm generating code and typechecking it with hint. But that happens at runtime. The problem is that hint is slow at actually executing things and high performance for this is critical. Also, at least in certain cases, I do not want to generate code and run it through ghc at runtime (though we have done some of that, too). So... I'm trying to find somewhere in the middle: dynamically hook things together without having to generate code and compile, but run it at compiled speed instead of interpreted.

Edit: Okay, so now that I see what's going on a bit more, here's a very general approach -- don't use polymorphic functions directly at all! Instead, use functions of type Dynamic -> IO ()! Then, they can use "typecase"-style dispatch directly to choose which monomorphic function to invoke -- i.e. just switching on the TypeRep. You do have to encode this dispatch directly for each polymorphic function you're wrapping. However, you can automate this with some template Haskell if it becomes enough of a hassle.
Essentially, rather than overloading Haskell's polymorphism, just as Dynamic embeds an dynamically typed language in a statically typed language, you now extend that to embed dynamic polymorphism in a statically typed language.
--
Old answer: More code would be helpful. But, as far as I can tell, this is the read/show problem. I.e. You have a function that produces a polymorphic result, and a function that takes a polymorphic input. The issue is that you need to pick what the intermediate value is, such that it satisfies both constraints. If you have a mechanism to do so, then the usual tricks will work, making sure you satisfy that open question which the compiler can't know the answer to.

I'm not sure that I completely understand your question. But since you have value and function which have compatible types you could combine them into single value. Then compiler could prove that types do match.
{-# LANGUAGE ExistentialQuantification #-}
data Vault = forall a . Vault (a -> IO ()) a
runVault :: Vault -> IO ()
runVault (Vault f x) = f xrun

Write a Haskell interpreter in Haskell

A classic programming exercise is to write a Lisp/Scheme interpreter in Lisp/Scheme. The power of the full language can be leveraged to produce an interpreter for a subset of the language.
Is there a similar exercise for Haskell? I'd like to implement a subset of Haskell using Haskell as the engine. Of course it can be done, but are there any online resources available to look at?
Here's the backstory.
I am exploring the idea of using Haskell as a language to explore some of the concepts in a Discrete Structures course I am teaching. For this semester I have settled on Miranda, a smaller language that inspired Haskell. Miranda does about 90% of what I'd like it to do, but Haskell does about 2000%. :)
So my idea is to create a language that has exactly the features of Haskell that I'd like and disallows everything else. As the students progress, I can selectively "turn on" various features once they've mastered the basics.
Pedagogical "language levels" have been used successfully to teach Java and Scheme. By limiting what they can do, you can prevent them from shooting themselves in the foot while they are still mastering the syntax and concepts you are trying to teach. And you can offer better error messages.

I love your goal, but it's a big job. A couple of hints:
I've worked on GHC, and you don't want any part of the sources. Hugs is a much simpler, cleaner implementation but unfortunately it's in C.
It's a small piece of the puzzle, but Mark Jones wrote a beautiful paper called Typing Haskell in Haskell which would be a great starting point for your front end.
Good luck! Identifying language levels for Haskell, with supporting evidence from the classroom, would be of great benefit to the community and definitely a publishable result!

There is a complete Haskell parser: http://hackage.haskell.org/package/haskell-src-exts
Once you've parsed it, stripping out or disallowing certain things is easy. I did this for tryhaskell.org to disallow import statements, to support top-level definitions, etc.
Just parse the module:
parseModule :: String -> ParseResult Module
Then you have an AST for a module:
Module SrcLoc ModuleName [ModulePragma] (Maybe WarningText) (Maybe [ExportSpec]) [ImportDecl] [Decl]
The Decl type is extensive: http://hackage.haskell.org/packages/archive/haskell-src-exts/1.9.0/doc/html/Language-Haskell-Exts-Syntax.html#t%3ADecl
All you need to do is define a white-list -- of what declarations, imports, symbols, syntax is available, then walk the AST and throw a "parse error" on anything you don't want them to be aware of yet. You can use the SrcLoc value attached to every node in the AST:
data SrcLoc = SrcLoc
{ srcFilename :: String
, srcLine :: Int
, srcColumn :: Int
}
There's no need to re-implement Haskell. If you want to provide more friendly compile errors, just parse the code, filter it, send it to the compiler, and parse the compiler output. If it's a "couldn't match expected type a against inferred a -> b" then you know it's probably too few arguments to a function.
Unless you really really want to spend time implementing Haskell from scratch or messing with the internals of Hugs, or some dumb implementation, I think you should just filter what gets passed to GHC. That way, if your students want to take their code-base and take it to the next step and write some real fully fledged Haskell code, the transition is transparent.

Do you want to build your interpreter from scratch? Begin with implementing an easier functional language like the lambda calculus or a lisp variant. For the latter there is a quite nice wikibook called Write yourself a Scheme in 48 hours giving a cool and pragmatic introduction into parsing and interpretation techniques.
Interpreting Haskell by hand will be much more complex since you'll have to deal with highly complex features like typeclasses, an extremely powerful type system (type-inference!) and lazy-evaluation (reduction techniques).
So you should define a quite little subset of Haskell to work with and then maybe start by extending the Scheme-example step by step.
Addition:
Note that in Haskell, you have full access to the interpreters API (at least under GHC) including parsers, compilers and of course interpreters.
The package to use is hint (Language.Haskell.*). I have unfortunately neither found online tutorials on this nor tried it out by myself but it looks quite promising.

create a language that has exactly the features of Haskell that I'd like and disallows everything else. As the students progress, I can selectively "turn on" various features once they've mastered the basics.
I suggest a simpler (as in less work involved) solution to this problem. Instead of creating a Haskell implementation where you can turn features off, wrap a Haskell compiler with a program that first checks that the code doesn't use any feature you disallow, and then uses the ready-made compiler to compile it.
That would be similar to HLint (and also kind of its opposite):
HLint (formerly Dr. Haskell) reads Haskell programs and suggests changes that hopefully make them easier to read. HLint also makes it easy to disable unwanted suggestions, and to add your own custom suggestions.
Implement your own HLint "suggestions" to not use the features you don't allow
Disable all the standard HLint suggestions.
Make your wrapper run your modified HLint as a first step
Treat HLint suggestions as errors. That is, if HLint "complained" then the program doesn't proceed to compilation stage

Baskell is a teaching implementation, http://hackage.haskell.org/package/baskell
You might start by picking just, say, the type system to implement. That's about as complicated as an interpreter for Scheme, http://hackage.haskell.org/package/thih

The EHC series of compilers is probably the best bet: it's actively developed and seems to be exactly what you want - a series of small lambda calculi compilers/interpreters culminating in Haskell '98.
But you could also look at the various languages developed in Pierce's Types and Programming Languages, or the Helium interpreter (a crippled Haskell intended for students http://en.wikipedia.org/wiki/Helium_(Haskell)).

If you're looking for a subset of Haskell that's easy to implement, you can do away with type classes and type checking. Without type classes, you don't need type inference to evaluate Haskell code.
I wrote a self-compiling Haskell subset compiler for a Code Golf challenge. It takes Haskell subset code on input and produces C code on output. I'm sorry there isn't a more readable version available; I lifted nested definitions by hand in the process of making it self-compiling.
For a student interested in implementing an interpreter for a subset of Haskell, I would recommend starting with the following features:
Lazy evaluation. If the interpreter is in Haskell, you might not have to do anything for this.
Function definitions with pattern-matched arguments and guards. Only worry about variable, cons, nil, and _ patterns.
Simple expression syntax:
Integer literals
Character literals
[] (nil)
Function application (left associative)
Infix : (cons, right associative)
Parenthesis
Variable names
Function names
More concretely, write an interpreter that can run this:
-- tail :: [a] -> [a]
tail (_:xs) = xs
-- append :: [a] -> [a] -> [a]
append [] ys = ys
append (x:xs) ys = x : append xs ys
-- zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
zipWith f (a:as) (b:bs) = f a b : zipWith f as bs
zipWith _ _ _ = []
-- showList :: (a -> String) -> [a] -> String
showList _ [] = '[' : ']' : []
showList show (x:xs) = '[' : append (show x) (showItems show xs)
-- showItems :: (a -> String) -> [a] -> String
showItems show [] = ']' : []
showItems show (x:xs) = ',' : append (show x) (showItems show xs)
-- fibs :: [Int]
fibs = 0 : 1 : zipWith add fibs (tail fibs)
-- main :: String
main = showList showInt (take 40 fibs)
Type checking is a crucial feature of Haskell. However, going from nothing to a type-checking Haskell compiler is very difficult. If you start by writing an interpreter for the above, adding type checking to it should be less daunting.

You might look at Happy (a yacc-like parser in Haskell) which has a Haskell parser.

This might be a good idea - make a tiny version of NetLogo in Haskell. Here is the tiny interpreter.

see if helium would make a better base to build upon than standard haskell.

Uhc/Ehc is a series of compilers enabling/disabling various Haskell features.
http://www.cs.uu.nl/wiki/Ehc/WebHome#What_is_UHC_And_EHC

I've been told that Idris has a fairly compact parser, not sure if it's really suitable for alteration, but it's written in Haskell.

Andrej Bauer's Programming Language Zoo has a small implementation of a purely functional programming language somewhat cheekily named "minihaskell". It is about 700 lines of OCaml, so very easy to digest.
The site also contains toy versions of ML-style, Prolog-style and OO programming languages.

Don't you think it would be easier to take the GHC sources and strip out what you don't want, than it would be to write your own Haskell interpreter from scratch? Generally speaking, there should be a lot less effort involved in removing features as opposed to creating/adding features.
GHC is written in Haskell anyway, so technically that stays with your question of a Haskell interpreter written in Haskell.
It probably wouldn't be too hard to make the whole thing statically linked and then only distribute your customized GHCi, so that the students can't load other Haskell source modules. As to how much work it would take to prevent them from loading other Haskell object files, I have no idea. You might want to disable FFI too, if you have a bunch of cheaters in your classes :)

The reason why there are so many LISP interpreters is that LISP is basically a predecessor of JSON: a simple format to encode data. This makes the frontend part quite easy to handle. Compared to that, Haskell, especially with Language Extensions, is not the easiest language to parse.
These are some syntactical constructs that sound tricky to get right:
operators with configurable precedence, associativity, and fixity,
nested comments
layout rule
pattern syntax
do- blocks and desugaring to monadic code
Each of these, except maybe the operators, could be tackled by students after their Compiler Construction Course, but it would take the focus away from how Haskell actually works. In addition to that, you might not want to implement all syntactical constructs of Haskell directly, but instead implement passes to get rid of them. Which brings us to the literal core of the issue, pun fully intended.
My suggestion is to implement typechecking and an interpreter for Core instead of full Haskell. Both of these tasks are quite intricate by themselves already.
This language, while still a strongly typed functional language, is way less complicated to deal with in terms of optimization and code generation.
However, it is still independent from the underlying machine.
Therefore, GHC uses it as an intermediary language and translates most syntaxical constructs of Haskell into it.
Additionally, you should not shy away from using GHC's (or another compiler's) frontend.
I'd not consider that as cheating since custom LISPs use the host LISP system's parser (at least during bootstrapping). Cleaning up Core snippets and presenting them to students, along with the original code, should allow you to give an overview of what the frontend does, and why it is preferable to not reimplement it.
Here are a few links to the documentation of Core as used in GHC:
System FC: equality constraints and coercions
GHC/As a library
The Core type

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string