Difference between hsc2hs and c2hs? - haskell

What is the difference between hsc2hs and c2hs?
I know what hsc2hs is a preprocessor but what does it exactly do?
And c2hs can make Haskell modules from C-code, but do I need hsc2hs for this?

They both have the same function: make it easier to write FFI bindings. You don't need to know about hsc2hs if you chose to use c2hs; they are independent. C2hs is more powerful, but also more complicated: Edward Z. Yang illustrates this point with a nice diagram in his c2hs tutorial:
When should I use c2hs? There are many
Haskell pre-processors; which one
should you use? A short (and somewhat
inaccurate) way to characterize the
above hierarchy is the further down
you go, the less boilerplate you have
to write and the more documentation
you have to read; I have thus heard
advice that hsc2hs is what you should
use for small FFI projects, while c2hs
is more appropriate for the larger
ones.
Things that c2hs supports that hsc2hs
does not:
Automatic generation of foreign import based on the contents of the C
header file
Semi-automatic marshalling to and from function calls, and
Translation of pointer types and hierarchies into Haskell types.

Mikhail's answer is good, but there's another side. There are also things that hsc2hs provides that c2hs does not, and it may be necessary to use both in conjunction.
Notably, hsc2hs operates by producing a C executable that is run to generate Haskell code, while c2hs parses header files directly. Therefore hsc2hs allows you to access #defines, etc. So while I've found c2hs better for generating bindings and wrappers to bindings as well as "deep" peeks and pokes into complex C structures, it is not good for accessing constants and enumerations, and it only automates mildly the boilerplate for Storable instances. I've found hsc2hs necessary as well, in conjunction with the bindings-dsl package [1], in particular in my case for predefined constants. In one instance, I have one hsc file for an enormous amount of constants, and one chs file for wrapping the functions that use these constants.
[1] http://hackage.haskell.org/package/bindings-DSL

Related

Haskell: Coherence rules wrt. external code

In Rust, I am not allowed (for good reasons) to for example implement libA::SomeTrait for libB::SomeType.
Are there similar rules enforced in Haskell, or does Haskell just sort of unify the whole program and make sure it's coherent as a whole?
Edit: To phrase the question more clearly in Haskell terms, can you have an orphan instance where the class and the type are defined in modules from external packages (say, from Hackage)?
It looks like you can have an orphan instance wrt. modules. Can the same be done wrt. packages?
Yes. In Haskell, the compiler doesn't really concern itself with packages at all, except for looking up which modules are available. (Basically, Cabal or Stack tells GHC which ones are available, and GHC just uses them.) And an available module in another package is treated exactly the same as a module in your own package.
Most often, this is not a problem – people are expected to be sensible regarding what instances they define. If they do define a bad instance, it only matters for people who use both libA and libB, and they should know.
If you do have a particular reason why you want to prevent somebody from instantiating a class, you should just not export it. (You can still export functions with that class in their constraint.)

GHC internals: is there C implementation of the type system?

I'm looking into internals of GHC and I find all the parsing and type system written completely in Haskell. Low-level core of the language is provided by RTS. The question is which one of the following is true?
RTS contains C implementation of the type system and other basic parts of Haskell (I didn't find it, RTS is mainly GC and threading)
Everything is implemented in Haskell itself. But it seems quite tricky because building GHC already requires GHC.
Could you explain development logic of the compiler? For example Python internals provide an opaque implementation of everything in C.
As others have noted in the comments, GHC is written almost entirely
in Haskell (plus select GHC extensions) and is intended to be compiled with itself. In fact, the only program in the world that can compile the GHC compiler is the GHC compiler! In particular,
parsing and type inference are implemented in Haskell code, and you
won't find a C implementation hidden in there anywhere.
The best source for understanding the internal structure of the
compiler (and what's implemented how) is the GHC Developer Wiki
and specifically the "GHC Commentary" link. If you have a fair bit of spare time, the video
series from the
Portland 2006 GHC Hackathon is absolutely fascinating.
Note that the idea of a compiler being written in the language it
compiles is not unusual. Many compilers are "self-hosting" meaning
that they are written in the language they compile and are intended to
compile themselves. See, for example, this question on another Stack
Exchange sister site: Why are self-hosting compilers considered a
rite of passage for new languages?, or simply Google for
"self-hosting compiler"
As you say, this is "tricky", because you need a way to get the
process started. Some approaches are:
You can write the first compiler in a different language that
already has a compiler (or write it in assembly language); then,
once you have a running compiler, you can port it to the same
language it compiles. According to this Quora answer, the
first C compiler was written this way. It was written in "NewB"
whose compiler was written in "B", a self-hosting compiler that
had originally been written in assembly and then rewritten in
itself.
If the language is popular enough to have another compiler, write
the compiler in its own language and compile it in phases, first
with the other compiler, then with itself (as compiled by the
other compiler), then again with itself (as compiled by itself).
The last two compiler executables can be compared as a sort of
massive test that the compiler is correct. The Gnu C Compiler can
be compiled this way (and this certainly used to be the standard way to install it from source, using the vendor's [inferior!] C compiler to get started).
If an interpreter written in another language already exists or is
easy to write, the compiler can be run by the interpreter to
compile its own source code, and thereafter the compiled compiler
can be used to compile itself. The first LISP compiler is
claimed to be the first compiler to bootstrap itself this way.
The bootstrapping process can often be simplified by writing the compiler (at least initially) in a restricted core of the language, even though the compiler itself is capable of compiling the full language. Then, a sub-par existing compiler or a simplified bootstrapping compiler or interpreter can get the process started.
According to the Wikipedia entry for GHC, the original GHC compiler was written in 1989 in Lazy ML, then rewritten in Haskell later the same year. These days, new versions of GHC with all their shiny new features are compiled on older versions of GHC.
The situation for the Python interpreter is a little different. An
interpreter can be written in the language it interprets, of course,
and there are many examples in the Lisp world of writing Lisp
interpreters in Lisp (for fun, or in developing a new Lisp dialect, or
because you're inventing Lisp), but it can't be interpreters all
the way down, so eventually you'd need either a compiler or an
interpreter implemented in another language. As a result, most
interpreters aren't self-hosting: the mainstream interpreters for
Python, Ruby, and PHP are written in C. (Though, PyPy is an alternate
implementation of the Python interpreter that's written in Python,
so...)

Haskell compiler magic: what requires a special treatment from the compiler?

When trying to learn Haskell, one of the difficulties that arise is the ability when something requires special magic from the compiler. One exemple that comes in mind is the seq function which can't be defined i.e. you can't make a seq2 function behaving exactly as the built-in seq. Consequently, when teaching someone about seq, you need to mention that seq is special because it's a special symbol for the compiler.
Another example would be the do-notation which only works with instances of the Monad class.
Sometimes, it's not always obvious. For instance, continuations. Does the compiler knows about Control.Monad.Cont or is it plain old Haskell that you could have invented yourself? In this case, I think nothing special is required from the compiler even if continuations are a very strange kind of beast.
Language extensions set aside, what other compiler magic Haskell learners should be aware of?
Nearly all the ghc primitives that cannot be implemented in userland are in the ghc-prim package. (it even has a module called GHC.Magic there!)
So browsing it will give a good sense.
Note that you should not use this module in userland code unless you know exactly what you are doing. Most of the usable stuff from it is exported in downstream modules in base, sometimes in modified form. Those downstream locations and APIs are considered more stable, while ghc-prim makes no guarantees as to how it will act from version to version.
The GHC-specific stuff is reexported in GHC.Exts, but plenty of other things go into the Prelude (such as basic data types, as well as seq) or the concurrency libraries, etc.
Polymorphic seq is definitely magic. You can implement seq for any specific type, but only the compiler can implement one function for all possible types [and avoid optimising it away even though it looks no-op].
Obviously the entire IO monad is deeply magic, as is everything to with concurrency and parallelism (par, forkIO, MVar), mutable storage, exception throwing and catching, querying the garbage collector and run-time stats, etc.
The IO monad can be considered a special case of the ST monad, which is also magic. (It allows truly mutable storage, which requires low-level stuff.)
The State monad, on the other hand, is completely ordinary user-level code that anybody can write. So is the Cont monad. So are the various exception / error monads.
Anything to do with syntax (do-blocks, list comprehensions) is hard-wired into the language definition. (Note, though, that some of these respond to LANGUAGE RebindableSyntax, which lets you change what functions it binds to.) Also the deriving stuff; the compiler "knows about" a handful of special classes and how to auto-generate instances for them. Deriving for newtype works for any class though. (It's just copying an instance from one type to another identical copy of that type.)
Arrays are hard-wired. Much like every other programming language.
All of the foreign function interface is clearly hard-wired.
STM can be implemented in user code (I've done it), but it's currently hard-wired. (I imagine this gives a significant performance benefit. I haven't tried actually measuring it.) But, conceptually, that's just an optimisation; you can implement it using the existing lower-level concurrency primitives.

Data.Lens or Control.Lens [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
lenses, fclabels, data-accessor - which library for structure access and mutation is better
I'm going to use and learn a Lens package on my next Haskell project. I had almost decided on the Data.Lens package when I found this post which mentions van Laarhoven Lenses in the Control.Lens package.
I don't really understand the differences enough yet to decide which one to use. Which package would you suggest I learn/use on a real world project?
Thanks.
lenses, fclabels, data-accessor - which library for structure access and mutation is better
Control.Lens is almost certainly what you want. Data.Lens came first, and is simpler, but Control.Lens has many advantages, and is being actively developed.
Other than lenses, Control.Lens has many related types, like traversals (a traversal is like a lens that can refer to n values instead of just one), folds, read/modify-only lenses, indexed lenses, isomorphisms... It also comes with a much larger library of useful functions and predefined lenses for standard library types, Template Haskell to derive lenses, and a bunch of code for other things like generic zippers and uniplate-style generic traversal.
It's a big library -- you don't have to use all of it, but it's nice to have the thing you want already written.
The main advantage of Data.Lens is that it's simpler, and as such doesn't require extensions beyond Haskell 98. But note that if you just want to export a Control.Lens-style lens from a library, you can do it without leaving Haskell 98 -- in fact, without depending on the package at all.
If you're dealing with a Real World Project (tm), I'd highly recommend Control.Lens. Edwardk has put a lot of recent effort into it, and I'm sure he'd love to hear about your use case. In my opinion, this is going to become the canonical Lens library. I believe it's safe to say that everything you can do with Data.Lens, you can do with Control.Lens.
Data.Lens is much simpler and easier to work with. Control.Lens has a very large number of modules and uses language extensions to get the job done.

haskell generate FFI export wrapper code

I am writing some code in haskell that has to be callable from C. Is there a tool or library in Haskell that simplifies writing FFI wrapper code for haskell functions that needs to be exported.
For example the tool given a haskell function to be exported would take care(generate the wrapper code) of mapping haskell types to the correct Foreign.C types etc. as required. It would also take care of generating the correct pointers when mapping [Int] types etc. Like what the questioner is attempting here Automatic conversion of types for FFI calls in Haskell. But only is it available like a library?
I wrote a tool called Hs2lib to do this. If you're on windows you're in luck, it'll do everything including compiling the code to a dll and generating c/c++ or c# wrappers. If you're on linux, I'm afraid I haven't gotten the compilation step to work yet, but it still produces the required marshalling information and stubs. You can tell it to keep those by using the -T flag.

Resources