Implementation tips for whole-program static analysis for Haskell - haskell

As part of a research project on property-based testing, I need to do static whole-program analysis of Haskell programs. I'm looking for suggestions on how to implement whole-program analysis of Haskell programs, hopefully without building lots of infrastructure myself.
I looked at Template Haskell, which has many of the capabilities I need, but is missing a key feature: in Template Haskell as implemented in GHC, there appears to be no way to get the definition of a function by name. (Related SO question: How to get the declaration of a function using `reify`?)
I suspect that there might be some way of doing whole-program analysis of Haskell programs using the GHC API, but I can't easily determine how this might be done from the GHC API documentation.
In particular, given a function call site, I need to be able to look up the corresponding function definition(s). I'm especially interested in Template Haskell or GHC API-based solutions.
Is there any way to do whole-program analysis of Haskell programs without building all the infrastructure myself?

Related

How to get a list of functions defined in a Haskell source file (for semantic analysis)?

I've found https://github.com/github/semantic however this does not yet seem to support Haskell source code.
I highly suspect such a library exists, however it's quite difficult to get relevant results from a web search.
What library supports this, ideally something fairly simple?
I suppose the defacto parser for Haskell in Haskell is GHC itself, but I'd assume interfacing with this would be difficult.

Looking for resources that help testing a Haskell implementation for standard conformance

I browsed around a while on haskell.org, haskell-prime wiki, etc. but did not find any resources like test suites, or some such, that would allow to check a Haskell implementation for standard compliance.
Does anybody know if such resources exist, and point me there?
Otherwise, I wonder what the Haskell Prime Comitee would do if someone claims that he has a Haskell-2010 compliant implementation?
Aside from the written standards for Haskell 2010 and Haskell98, I don't think there is an "official test suite" for compliance. If you are developing a Haskell implementation, perhaps you can adapt the GHC test framework to suit your needs.

What would be involved in calling ARPACK++ (a C++ library) from Haskell?

I've spent a couple of days developing a program in Haskell, while learning the language. Now I realize that I'll need to call Arpack (a Fortran library) or Arpack++ (a C++ wrapper to Arpack) -- I can't find a good implementation of Lanczos method with Haskell bindings. Do any more experienced Haskell programers have an opinion of how difficult this would be?
I've been able to get ".so" ("shared object") versions of libarpack and libarpack++ installed through Ubuntu's repository, but I'm not sure that will suffice. I suspect I'm going to ultimately need to build Arpack++ from source code, which is possible, but I'm getting a lot of build errors, so it will take time. Is there any way to use just the ".so" files, without knowing exactly which version of the header files were used to generate them?
I'm considering using GreenCard, because it looks like the most well maintained Haskell/C bridge. I can't find much documentation though, so I'm wondering whether it will support C++ too.
I'm also starting to wonder whether I should rewrite my program in Python, and use scipy to call Arpack, but I've already sunk a couple of days into writing Haskell. I really like Haskell too, so I'm hoping I can make this work. I guess my overall question is this: What would be involved in making this work with Haskell?
Thanks much.
ELF format is standard format of executables and shared libraries, so accessing the code in these compiled modules is only a matter of knowing function names. If I understand correctly, Fortran is interoperable with C. As a consequence, Fortran should be interoperable with any language which can use C bindings, including Haskell. FYI, you can find all names exported by a module (executable or shared object or simple object archive) using nm tool (it is usually available in all linux distros by default). This of course would work if the binary file was not "stripped", but AFAIK it is not common practice.
However, Haskell cannot use C++ bindings in sane way, since C++ polymorphic features require name mangling, and the method of this name transformation is highly compiler-dependent. It is well-known problem which is not specific to Haskell. Of course, you could try to get a list of exported symbols from C++ shared object and then bind them using FFI, but... It isn't worth it.
As dsign said, you can use Foreign Function Interface GHC feature to create bindings to foreign code. All you would require is library headers (and the library itself of course). In case of C language that would be header files (*.h), but since your library is written in Fortran, you have to find header files analogue in library sources, refere to this page to match Fortran and C types, and then use this information to write FFI bindings. It would be helpful first to write C bindings, i.e. write C header. Then you can even use automatic FFI binding programs like c2hs.
It maybe also helpful to look through C++ bindings. It is possible that it has the header file I've described above. If it has one, then writing FFI bindings will be no more difficult than writing them for any other library.
So, it is not entirely impossible, but it may require some thorough work. Writing bindings to scientific/pure computational libraries is way easier than writing them for some system library which does a lot of IO and keeps its own internal state, but since this library is written not in C... Well, it may be advisable to invest your time in easier alternatives. I cannot say anythin about scipy, I've never used it, but since Python as a language is much more simpler than Haskell, it may be good alternative.
I can tell you that using a C/Fortran library from Haskell, with the help of the Foreign Function Interface would be certainly possible and not terribly complicated. Here is an introduction. In my understanding, you should be able to call anything with a C calling convention, and perhaps even Fortran, without need of recompiling the code. The only exception is with things that look like function calls but are indeed macros, in which case you will have to figure out what the macros do and reproduce them in Haskell.
As of greencard, I have never used it, so I can not vouch for it.
Your second idea of using Python could potentially save you more than a couple of days. Sad as it is, I have never managed Haskell code to easily adapt to my changing requirements, while I find that trivial in Python. Of course, that could be a limitation on my skills with Haskell or my thinking process rather that something to blame to the language.

Functional programming languages introspection

I'm sketching a design of something (machine learning of functions) that will preferably want a functional programming language, and also introspection, specifically the ability to examine the program's own code in some nicely tractable format, and preferably also the ability to get machine generated code compiled at runtime, and I'm wondering what's the best language to write it in. Lisp of course has strong introspection capabilities, but the statically typed languages also have advantages; the ones I'm considering are:
F# - the .Net platform has a good story here, you can read byte code at run time and also emit byte code and get it compiled; I assume there's no problem accessing these facilities from F#.
Haskell, Ocaml - do these have similar facilities, either via byte code or parse tree?
Are there other languages I should also be looking at?
Haskell's introspection mechanism is Template Haskell, which supports compile time metaprogramming, and when combined with e.g. llvm, provides runtime metaprogramming facilities.
Ocaml has:
Camlp4 to manipulate Ocaml concrete syntax trees in Ocaml. The maintained implementation of Camlp4 is Camlp5.
MetaOCaml for full-scale multi-stage programming.
Ocamljit to generate native code at run time, but I don't think it's been maintained recently.
Ocaml-Java to compile Ocaml code for the Java virtual machine. I don't know if there are nice reflection capabilities.
Not really an answer, but note also the F# Quotations feature and library, for more homoiconicity stuff.
You might check out the typed variant of Racket (previously known as PLT Scheme). It retains most of the syntactic simplicity of Scheme, but provides a static type system. Since Racket is a Scheme, metaprogramming is par for the course, and the runtime can emit native code by way of a JIT.
The Haskell approach would be more along the lines of parsing the source. The Haskell Platform includes a complete source parser, or you can use the GHC API to get access that way.
I'd also look at Scala or Clojure which come with them all the libraries that have been developed for Java. You'll never need to worry if a library does not exist. But more to the point of your question, these languages give you the same reflection (or more powerful types) that you will find within Java.
I'm sketching a design of something (machine learning of functions) that will preferably want a functional programming language, and also introspection, specifically the ability to examine the program's own code in some nicely tractable format, and preferably also the ability to get machine generated code compiled at runtime, and I'm wondering what's the best language to write it in. Lisp of course has strong introspection capabilities, but the statically typed languages also have advantages; the ones I'm considering are:
Can you not just parse the source code like an ordinary interpreter or compiler? Why do you need introspection?
F# - the .Net platform has a good story here, you can read byte code at run time and also emit byte code and get it compiled; I assume there's no problem accessing these facilities from F#.
F# has a rudimentary quotation mechanism but you can only quote some expressions and not other kinds of code, most notably type definitions. Also, its evaluation mechanism is orders of magnitude slower than genuine compilation so it is basically completely useless. You can use reflection to analyze type definitions but, again, it is quite rudimentary.
You can read byte code but that has been compiled so a lot of information and structure has been lost.
F# also has lexing and parsing technology (most notably fslex, fsyacc and FParsec) but it is not as mature as OCaml's.
Haskell, Ocaml - do these have similar facilities, either via byte code or parse tree?
Haskell has Template Haskell but I've never heard of anyone using it (abandonware?).
OCaml has its Camlp4 macro system and a few people do use it but it is poorly documented.
As for lexing and parsing, Haskell has a few libraries (most notably Parsec) and OCaml has many libraries.
Are there other languages I should also be looking at?
Term rewrite languages like Mathematica would be an obvious choice because they make it trivial to manipulate code. The Pure language might be of interest.
You might also consider MetaOCaml for its run-time compilation capabilities.

When choosing a functional programming language for use with LLVM, what are the trade-offs?

Let's assume for the moment that C++ is not a functional programming language. If you want to write a compiler using LLVM for the back-end, and you want to use a functional programming language and its bindings to LLVM to do your work, you have two choices as far as I know: Objective Caml and Haskell. If there are others, then I'd like to know about those too.
I'm not asking for subjective opinions, so please don't give this the subjective tag. I want to make up my own mind about this, but I'm not sure I know what are all the trade-offs. So, StackOverflow to the rescue. What are the trade-offs?
Either OCaml or Haskell would be a good choice. Why not check out the LLVM tutorials for each language? The LLVM tutorial for OCaml is here: http://llvm.org/docs/tutorial/OCamlLangImpl1.html
Haskell has more momentum these days, but there are plenty of good parsing libraries for OCaml as well including the PEG parser generator Aurochs, Menhir, and the GLR parser generator Dypgen. Also check out this presentation on pcl a monadic parser combinator library for OCaml (like Parsec for Haskell) there's some good info in there comparing Haskell's and OCaml's approach: http://osp.janestreet.com/files/pcl.pdf
Some will say that laziness gives Haskell the edge in parsing, but you can get laziness in OCaml as well.
Haskell has higher level bindings to LLVM than OCaml (the Haskell ones provide some interesting type safety guarantees) and Haskell has by far more libraries to use (1700 packages on http://hackage.haskell.org) making it easier to glue together components.
Availability of native bindings need not constrain your choice of language. There is a third option, apart from using bindings or generating IR text directly:
You can use a language-neutral serialization format, such as Google's Protocol Buffers, to serve as the bridge from your front-end to your back-end. Protocol buffers are, after all, just ASTs in disguise.
Your front end, implemented in a functional language, then does what it is best at -- parsing, type checking, desugaring, core-to-core transformations, etc -- and the C++ backend takes the IR from your frontend and uses LLVM's feature-complete-by-definition native C++ API to do lowering from your-language-IR to LLVM IR. This makes it much easier to handle "advanced" features of LLVM such as debug metadata.
I'm using this strategy with hprotoc and associated Haskell bindings for protocol buffers, and am very happy with the results. There is much to be said for using the right tool for the job!
OCaml is the only functional language with bindings in the LLVM distro itself and documentation on llvm.org such as the Kaleidoscope tutorial. If you have OCaml installed when you build and install LLVM then it will automatically build and install the LLVM bindings for OCaml as well. Moreover, these OCaml bindings have been in use for years so they are mature and reliable.
I have been developing HLVM in OCaml using the standard LLVM bindings and found OCaml+LLVM to be an extremely powerful combination. HLVM provides tuples, arrays, unions, TCO of all tail calls, generic printing, FFI to C, JIT compilation and parallel garbage collection with a VM weighing in at under 2kLOC of OCaml code that took only a few man-weeks to develop from scratch. HLVM's numerical performance already far exceeds that of today's fastest open source FPLs including OCaml itself. I have published articles in the OCaml Journal describing how LLVM can be used from OCaml for everything from basic expression evaluation to advanced topics such as parallelism and garbage collection. You may also like this mini example.

Resources