What are module signatures in Haskell?

What are module signatures in Haskell? - haskell

I have recently found Haskell's feature called "module signatures". As I have discovered they are put in .hsig files and begin with signature keyword instead of module.
The example syntax of such a file may look like
signature Str where
data Str
empty :: Str
append :: Str -> Str -> Str
However, I cannot imagine how and why one would use them. Could you explain me which problems do they solve and how to properly make use of them?
They strongly remind me the module system that one can see in OCaml (link), which also has modules signatures and separate implementations, but I can't decide how close are these two concepts. Is it somehow related?

They are related to the OCaml module system, but with some important differences:
Signatures are defined within the language (in .hsig files) but unlike in OCaml they are not instantiated within the language. Instead, the package manager controls instantiation (currently, only Cabal provides that). Modules never know if they are importing an abstract signature or an actual module.
Implementation modules know nothing about signatures and do not not reference them directly. Any existing module can implement a signature if the definitions happen to be compatible.
Instantiation is triggered by a coincidence of module name and signature name in the dependencies of some compilation unit (executable, library, test suite...) When the names coincide, a process called "signature matching" takes place that verifies that the types and definitions are compatible.
The "happy path" is that in your program you depend on some library having a signature "hole", and also on another library that provides an implementation module with the same name. Then signature matching happens automatically. When the names don't match, or we need multiple instantiations of the signature-using library, we have to rename signatures and/or modules in the mixins section of the Cabal file.
As for why module signatures might be useful, consider bytestring, the most popular library by far for handling binary data in Haskell. But there are others, for example stdio with its Bytes type.
Suppose you are writing your own library that uses binary data, and you don't want to force your users into either stdio or bytestring. What are your choices?
One would be to create something like a Bytelike class and parameterize all your functions with it. You would also need to add a type parameter to every data type that contains bytes.
Another would be to create a signature that defines an abstract binary data type and all the operations that are required of it. Your library would make use of the signature, and remain "indefinite" until the user depends both on your library and a suitable implementation when creating his own libraries.
From the perspective of the user, the typeclass solution is unsatisfactory. The user knows that he wants to use either ByteString or Bytes, just one of them. The decision will not depend on some runtime flag and will remain constant across the extent of his program. And yet he has to deal with a more complex API that reminds him of that already decided issue at every turn.
It's better if he makes the decision once, writes it in his .cabal file, and deals with a simpler API from then onwards.

As described here, they're quite closely related to OCaml module signatures. They allow you to create a package missing some modules and say these modules, containing such and such types and values, should be delivered by package's user. I haven't tested it myself, but I imagine that such a package works very much like an OCaml functor.

Related

How to define user-definable custom types at compile-time for a library using stack

After working on my Haskell project for a while I decided on using multiple executables with a single library with stack. The reason behind, is that I want to have multiple test executables so that each can showcase some uses for the single library that I am writing. The problem occurred when I tried to define a type that could be potentially different for each executable.
To give you an idea of what I would like to accomplish:
I currently have a type synonim defined in my library:
type GameState = Int
The type Int woks for executable 1, however, for the compilation of the executable two, I want MyOtherDataType to take place of the Int:
type GameState = AnotherDataType
I tried setting the state type variable using a #define, with the help of the CPP Language extension. This variable could be passed as a parameter at compile time for the library, but I couldn't make stack route a defined type variable in the executable definition to the library (Eg: -DGAME_STATE_TYPE=Int). This is the closest idea I have for a clean solution that wouldn't require rewriting a lot of my code.
I also tried changing the type definition to:
data GameStateType a = GameStateType a
but I soon learned that this would require me to rewrite a lot of type classes, type definitions, monad stacks, and enable extensions such as MultiParamTypeClasses. I don't think that is a path I want to go given the time constraint, however, I would like to hear it anyways should you consider this the only/better option.

Can Haskell functions be serialized?

The best way to do it would be to get the representation of the function (if it can be recovered somehow). Binary serialization is preferred for efficiency reasons.
I think there is a way to do it in Clean, because it would be impossible to implement iTask, which relies on that tasks (and so functions) can be saved and continued when the server is running again.
This must be important for distributed haskell computations.
I'm not looking for parsing haskell code at runtime as described here: Serialization of functions in Haskell.
I also need to serialize not just deserialize.

Unfortunately, it's not possible with the current ghc runtime system.
Serialization of functions, and other arbitrary data, requires some low level runtime support that the ghc implementors have been reluctant to add.
Serializing functions requires that you can serialize anything, since arbitrary data (evaluated and unevaluated) can be part of a function (e.g., a partial application).

No. However, the CloudHaskell project is driving home the need for explicit closure serialization support in GHC. The closest thing CloudHaskell has to explicit closures is the distributed-static package. Another attempt is the HdpH closure representation. However, both use Template Haskell in the way Thomas describes below.
The limitation is a lack of static support in GHC, for which there is a currently unactioned GHC ticket. (Any takers?). There has been a discussion on the CloudHaskell mailing list about what static support should actually look like, but nothing has yet progressed as far as I know.
The closest anyone has come to a design and implementation is Jost Berthold, who has implemented function serialisation in Eden. See his IFL 2010 paper "Orthogonal Serialisation for Haskell". The serialisation support is baked in to the Eden runtime system. (Now available as separate library: packman. Not sure whether it can be used with GHC or needs a patched GHC as in the Eden fork...) Something similar would be needed for GHC. This is the serialisation support Eden, in the version forked from GHC 7.4:
data Serialized a = Serialized { packetSize :: Int , packetData :: ByteArray# }
serialize :: a -> IO (Serialized a)
deserialize :: Serialized a -> IO a
So: one can serialize functions and data structures. There is a Binary instance for Serialized a, allowing you to checkpoint a long-running computation to file! (See Secion 4.1).
Support for such a simple serialization API in the GHC base libraries would surely be the Holy Grail for distributed Haskell programming. It would likely simplify the composability between the distributed Haskell flavours (CloudHaskell, MetaPar, HdpH, Eden and so on...)

Check out Cloud Haskell. It has a concept called Closure which is used to send code to be executed on remote nodes in a type safe manner.

Eden probably comes closest and probably deserves a seperate answer: (De-)Serialization of unevaluated thunks is possible, see https://github.com/jberthold/packman.
Deserialization is however limited to the same program (where program is a "compilation result"). Since functions are serialized as code pointers, previously unknown functions cannot be deserialized.
Possible usage:
storing unevaluated work for later
distributing work (but no sharing of new code)

A pretty simple and practical, but maybe not as elegant solution would be to (preferably have GHC automatically) compile each function into a separate module of machine-independent bytecode, serialize that bytecode whenever serialization of that function is required, and use the dynamic-loader or plugins packages, to dynamically load them, so even previously unknown functions can be used.
Since a module notes all its dependencies, those could then be (de)serialized and loaded too. In practice, serializing index numbers and attaching an indexed list of the bytecode blobs would probably be the most efficient.
I think as long as you compile the modules yourself, this is already possible right now.
As I said, it would not be very pretty though. Not to mention the generally huge security risk of de-serializing code from insecure sources to run in an unsecured environment. :-)
(No problem if it is trustworthy, of course.)
I’m not going to code it up right here, right now though. ;-)

How, why and when to use the ".Internal" modules pattern?

I've seen a couple of package on hackage which contain module names with .Internal as their last name component (e.g. Data.ByteString.Internal)
Those modules are usually not properly browsable (but they may show up nevertheless) in Haddock and should not be used by client code, but contain definitions which are either re-exported from exposed modules or just used internally.
Now my question(s) to this library organization pattern are:
What problem(s) do those .Internal modules solve?
Are there other preferable ways to workaround those problems?
Which definitions should be moved to those .Internal modules?
What's the current recommended practice with respect to organizing libraries with the help of such .Internal modules?

Internal modules are generally modules that expose the internals of a package, that break package encapsulation.
To take ByteString as an example: When you normally use ByteStrings, they are used as opaque data types; a ByteString value is atomic, and its representation is uninteresting. All of the functions in Data.ByteString take values of ByteString, and never raw Ptr CChars or something.
This is a good thing; it means that the ByteString authors managed to make the representation abstract enough that all the details about the ByteString can be hidden completely from the user. Such a design leads to encapsulation of functionality.
The Internal modules are for people that wish to work with the internals of an encapsulated concept, to widen the encapsulation.
For example, you might want to make a new BitString data type, and you want users to be able to convert a ByteString into a BitString without copying any memory. In order to do this, you can't use opaque ByteStrings, because that doesn't give you access to the memory that represents the ByteString. You need access to the raw memory pointer to the byte data. This is what the Internal module for ByteStrings provides.
You should then make your BitString data type encapsulated as well, thus widening the encapsulation without breaking it. You are then free to provide your own BitString.Internal module, exposing the innards of your data type, for users that might want to inspect its representation in turn.
If someone does not provide an Internal module (or similar), you can't gain access to the module's internal representation, and the user writing e.g. BitString is forced to (ab)use things like unsafeCoerce to cast memory pointers, and things get ugly.
The definitions that should be put in an Internal module are the actual data declarations for your data types:
module Bla.Internal where
data Bla = Blu Int | Bli String
-- ...
module Bla (Bla, makeBla) where -- ONLY export the Bla type, not the constructors
import Bla.Internal
makeBla :: String -> Bla -- Some function only dealing with the opaque type
makeBla = undefined

#dflemstr is right, but not explicit about the following point. Some authors put internals of a package in a .Internal module and then don't expose that module via cabal, thereby making it inaccessible to client code. This is a bad thing1.
Exposed .Internal modules help to communicate different levels of abstraction implemented by a module. The alternatives are:
Expose implementation details in the same module as the abstraction.
Hide implementation details by not exposing them in module exports or via cabal.
(1) makes the documentation confusing, and makes it hard for the user to tell the transition between his code respecting a module's abstraction and breaking it. This transition is important: it is analogous to removing a parameter to a function and replacing its occurrences with a constant, a loss of generality.
(2) makes the above transition impossible and hinders the reuse of code. We would like to make our code as abstract as possible, but (cf. Einstein) no more so, and the module author does not have as much information as the module user, so is not in a position to decide what code should be inaccessible. See the link for more on this argument, as it is somewhat peculiar and controversial.
Exposing .Internal modules provides a happy medium which communicates the abstraction barrier without enforcing it, allowing users to easily restrict themselves to abstract code, but allowing them to "beta expand" the module's use if the abstraction breaks down or is incomplete.
1 There are, of course, complications to this puristic judgement. An internal change can now break client code, and authors now have a larger obligation to stabilize their implementation as well as their interface. Even if it is properly disclaimed, users is users and gotsta be supported, so there is some appeal to hiding the internals. It begs for a custom version policy which differentiates between .Internal and interface changes, but fortunately this is consistent with (but not explicit in) the versioning policy. "Real code" is also notoriously lazy, so exposing an .Internal module can provide an easy out when there was an abstract way to define code that was just "harder" (but ultimately supports the community's reuse). It can also discourage reporting an omission in the abstract interface that really should be pushed to the author to fix.

The idea is that you can have the "proper", stabile API which you export from MyModule and this is the preferred and documented way to use the library.
In addition to the public API, your module probably has private data constructors and internal helper functions etc. The MyModule.Internal submodule can be used to export those internal functions instead of keeping them completely locked inside the module.
It lets the users of your libary to access the internals if they have needs that you didn't foresee, but with the understanding that they are accessing an internal API that doesn't have the same implicit guarantees as the public one.
It lets you access the internal functions and constructors for e.g. unit-testing purposes.

One extension (or possibly clarification) to what shang and dflemstr said: if you have internal definitions (data types whose constructors aren't exported, etc.) that you want to access from multiple modules which are exported, then you typically create such an .Internal module which isn't exposed at all (i.e. listed in Other-Modules in the .cabal file).
However, this sometimes does leak out when doing types in ghci (e.g. when using a function but where some of the types it refers to aren't in scope; can't think of an instance where this happens off the top of my head, but it does).

Importing modules as a function, with string as input

I want to make a function called 'load' which imports definitions of functions from another file. I know how to import modules, but in my program I want the definitions of the functions to change depending on which module is 'loaded' with this new function. Is there a way to do this? Is there a better way to write my program so that this is not necessary?
I think it's type signature would look something like:
load :: String -> IO ()
where the string is the name of the module to be loaded (and the module is in the same directory).
Edit: Thanks for all the replies. Most people agree that this is not the best way to do what I want. Instead, is there a way to declare a global variable from within an I/O program. That is, I want it so that if I type (function "thing") into a function of type String -> IO(), I can still type 'thing' into GHCi to get the value assigned to it... Any suggestions?

There is almost certainly a better way to write your program so that this is not necessary. It's hard to say what without knowing more details about your situation, though. You could, for instance, represent the generic interface each module implements as a data-type, and have each module export a value of that type with the implementation.
Basically, the set of loaded modules is a static, compile-time property, so it makes no sense to want your program's behaviour to change based on its contents. Are you trying to write a library? Your users probably won't appreciate it doing such evil magic to their import lists :) (And it probably isn't possible without Template Haskell in that case, anyway.)
The exception is if you're trying to implement a Haskell tool (e.g. REPL, IDE, etc.) or trying to do plugins; i.e. dynamically-loaded modules of Haskell source code to integrate into your Haskell program. The first thing to try for those should be hint, but you may find you need something more advanced; in that case, the GHC API is probably your best bet. plugins used to be the de-facto standard in this area, but it doesn't seem to compile with GHC 7; you might want to check out direct-plugins, a simplified implementation of a similar interface that does.
mueval might be relevant; it's designed for executing short (one-line) snippets of Haskell code in a safe sandbox, as used by lambdabot.

Unless you're building a Haskell IDE or something like that, you most likely don't need this (^1).
But, in the case you do, there is always the hint-package, which allows you to embed a haskell interpreter into your program. This allows you to both load haskell modules and to convert strings into haskell values at runtime. There is a nice example of how to use it here
^1: If you're looking for a way to make things polymorphic, i.e. changing some, but not all definitions of in your code, you're probably looking for typeclasses.

With regards to your edit, perhaps you might be interested in IORef.

How do I do automatic data serialization of data objects?

One of the huge benefits in languages that have some sort of reflection/introspecition is that objects can be automatically constructed from a variety of sources.
For example, in Java I can use the same objects for persisting to a db (with Hibernate), serializing to XML (with JAXB), and serializing to JSON (json-lib). You can do the same in Ruby and Python also usually following some simple rules for properties or annotations for Java.
Thus I don't need lots "Domain Transfer Objects". I can concentrate on the domain I am working in.
It seems in very strict FP like Haskell and Ocaml this is not possible.
Particularly Haskell. The only thing I have seen is doing some sort of preprocessing or meta-programming (ocaml). Is it just accepted that you have to do all the transformations from the bottom upwards?
In other words you have to do lots of boring work to turn a data type in haskell into a JSON/XML/DB Row object and back again into a data object.

I can't speak to OCaml, but I'd say that the main difficulty in Haskell is that deserialization requires knowing the type in advance--there's no universal way to mechanically deserialize from a format, figure out what the resulting value is, and go from there, as is possible in languages with unsound or dynamic type systems.
Setting aside the type issue, there are various approaches to serializing data in Haskell:
The built-in type classes Read/Show (de)serialize algebraic data types and most built-in types as strings. Well-behaved instances should generally be such that read . show is equivalent to id, and that the result of show can be parsed as Haskell source code constructing the serialized value.
Various serialization packages can be found on Hackage; typically these require that the type to be serialized be an instance of some type class, with the package providing instances for most built-in types. Sometimes they merely require an automatically derivable instance of the type-reifying, reflective metaprogramming Data class (the charming fully qualified name for which is Data.Data.Data), or provide Template Haskell code to auto-generate instances.
For truly unusual serialization formats--or to create your own package like the previously mentioned ones--one can reach for the biggest hammer available, sort of a "big brother" to Read and Show: parsing and pretty-printing. Numerous packages are available for both, and while it may sound intimidating at first, parsing and pretty-printing are in fact amazingly painless in Haskell.
A glance at Hackage indicates that serialization packages already exist for various formats, including binary data, JSON, YAML, and XML, though I've not used any of them so I can't personally attest to how well they work. Here's a non-exhaustive list to get you started:
binary: Performance-oriented serialization to lazy ByteStrings
cereal: Similar to binary, but a slightly different interface and uses strict ByteStrings
genericserialize: Serialization via built-in metaprogramming, output format is extensible, includes R5RS sexp output.
json: Lightweight serialization of JSON data
RJson: Serialization to JSON via built-in metaprogramming
hexpat-pickle: Combinators for serialization to XML, using the "hexpat" package
regular-xmlpickler: Serialization to XML of recursive data structures using the "regular" package
The only other problem is that, inevitably, not all types will be serializable--if nothing else, I suspect you're going to have a hard time serializing polymorphic types, existential types, and functions.

For what it's worth, I think the pre-processor solution found in OCaml (as exemplified by sexplib, binprot and json-wheel among others) is pretty great (and I think people do very similar things with Template Haskell). It's far more efficient than reflection, and can also be tuned to individual types in a natural way. If you don't like the auto-generated serializer for a given type foo, you can always just write your own, and it fits beautifully into the auto-generated serializers for types that include foo as a component.
The only downside is that you need to learn camlp4 to write one of these for yourself. But using them is quite easy, once you get your build-system set up to use the preprocessor. It's as simple as adding with sexp to the end of a type definition:
type t = { foo: int; bar: float }
with sexp
and now you have your serializer.

You wanted
to do lot of boring work to turn a data type in haskell into JSON/XML/DB Row object and back again into a data object.
There are many ways to serialize and unserialize data types in Haskell. You can use for example,
Data.Binary
Text.JSON
as well as other common formants (protocol buffers, thrift, xml)
Each package often/usually comes with a macro or deriving mechanism to allow you to e.g. derive JSON. For Data.Binary for example, see this previous answer: Erlang's term_to_binary in Haskell?
The general answer is: we have many great packages for serialization in Haskell, and we tend to use the existing class 'deriving' infrastructure (with either generics or template Haskell macros to do the actual deriving).

My understanding is that the simplest way to serialize and deserialize in Haskell is to derive from Read and Show. This is simple and isn't fullfilling your requirements.
However there are HXT and Text.JSON which seem to provide what you need.

The usual approach is to employ Data.Binary. This provides the basic serialisation capability. Binary instances for data types are easy to write and can easily be built out of smaller units.
If you want to generate the instances automatically then you can use Template Haskell. I don't know of any package to do this, but I wouldn't be surprised if one already exists.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string