How small should I make make modules in Haskell? - haskell

I'm writing a snake game in Haskell. These are some of the things I have:
A Coord data type
A Line data type
A Rect data type
A Polygon type class, which allows me to get a Rect as a series of lines ([Line]).
An Impassable type class that allows me to get a Line as a series of Coords ([Coord]) so that I can detect collisions between other Impassables.
A Draw type class for anything that I want to draw to the screen (HSCurses).
Finally I'm using QuickCheck so I want to declare Arbitrary instances for a lot of these things.
Currently I have a lot of these in separate modules so I have lots of small modules. I've noticed that I have to import a lot of them for each other so I'm kind of wondering what the point was.
I'm particularly confused about Arbitrary instances. When using -Wall I get warnings about orphaned instances when I but those instances together in one test file, my understanding is that I can avoid that warning by putting those instances in the same module as where the data type is defined but then I'll need to import Test.QuickCheck for all those modules which seems silly because QuickCheck should only be required when building the test executable.
Any advice on the specific problem with QuickCheck would be appreciated as would guidance on the more general problem of how/where programs should be divided into modules.

You can have your cake and eat it too. You can re-export modules.
module Geometry
( module Coord, module Line, module Rect, module Polygon, module Impassable )
where
I usually use a module whenever I have a complete abstraction -- i.e. when a data type's meaning differs from its implementation. Knowing little about your code, I would probably group Polygon and Impassable together, perhaps making a Collision data type to represent what they return. But Coord, Line, and Rect seem like good abstractions and they probably deserve their own modules.

For testing purposes, I use separate modules for the Arbitrary instances. Although I generally avoid orphan instances, these modules only get built when building the test executable so I don't mind the orphans or that it's not -Wall clean. You can also use -fno-warn-orphans to disable just this warning message.

I generally put more emphasis on the module interface as defined by the functions it exposes rather than the data types it exposes. Do some of your types share a common set of functions? Then I would put them in the same module.
But my practise is probably not the best since I usually write small programs. I would advise looking at some code from Hackage to see what package maintainers do.
If there were a way to sort packages by community rating or number of downloads, that would be a good place to start. (I thought there was, but now that I look for it, I can't find it.) Failing that, look at packages that you already use.

One solution with QuickCheck is to use the C preprocessor to selectively enable the Arbitrary instances when you are testing. You put the Arbitrary instances straight into your main modules but wrap them with preprocessor macros, then put a "test" flag into your Cabal file.

Related

Leave free access to internal intermediary functions in Haskell library?

I'm writing a numerical optimisation library in Haskell, with the aim of making functions like a gradient descent algorithm available for users of the library. In writing these relatively complex functions, I write intermediary functions, such as a function that performs just one step of gradient descent. Some of these intermediary functions perform tasks that no user of the library could ever have need for. Some are even quite cryptic, but make sense when used by a bigger function.
Is it common practice to leave these intermediary functions available to library users? I have considered moving these to an "Internal" library, but moving small functions into a whole different library from the main functions using them seems like a bad idea for code legibility. I'd also quite like to test these smaller functions as well as the main functions for debugging purposes down the line - and ideally would like to test both in the same place, so that complicates things even more.
I'm unsurprisingly using Cabal for the library so answers in that context as well would be helpful if that's easier.
You should definitely not just throw such internal functions in the export of your package's trunk module, together with the high-level ones. It makes the interface/haddocks hard to understand, and also poses problems if users come to depend on low-level details that may easily change in future releases.
So I would keep these functions in an “internal” module, which the “public” module imports but only re-exports those that are indended to be used:
Public
module Numeric.Hegash.Optimization (optimize) where
import Numeric.Hegash.Optimization.Internal
Private
module Numeric.Hegash.Optimization.Internal where
gradientDesc :: ...
gradientDesc = ...
optimize :: ...
optimize = ... gradientDesc ...
A more debatable matter is whether you should still allow users to load the Internal module, i.e. whether you should put it in the exposed-modules or other-modules section of your .cabal file. IMO it's best to err on the “exposed” side, because there could always be valid use cases that you didn't foresee. It also makes testing easier. Just ensure you clearly document that the module is unstable. Only functions that are so deeply in the implementation details that they are basically impossible to use outside of the module should not be exposed at all.
You can selectively export functions from a module by listing them in the header. For example, if you have functions gradient and gradient1 and only want to export the former, you can write:
module Gradient (gradient) where
You can also incorporate the intermediary functions into their parent functions using where to limit the scope to just the parent function. This will also prevent the inner function from being exported:
gradient ... =
...
where
gradient1 ... = ...

What are module signatures in Haskell?

I have recently found Haskell's feature called "module signatures". As I have discovered they are put in .hsig files and begin with signature keyword instead of module.
The example syntax of such a file may look like
signature Str where
data Str
empty :: Str
append :: Str -> Str -> Str
However, I cannot imagine how and why one would use them. Could you explain me which problems do they solve and how to properly make use of them?
They strongly remind me the module system that one can see in OCaml (link), which also has modules signatures and separate implementations, but I can't decide how close are these two concepts. Is it somehow related?
They are related to the OCaml module system, but with some important differences:
Signatures are defined within the language (in .hsig files) but unlike in OCaml they are not instantiated within the language. Instead, the package manager controls instantiation (currently, only Cabal provides that). Modules never know if they are importing an abstract signature or an actual module.
Implementation modules know nothing about signatures and do not not reference them directly. Any existing module can implement a signature if the definitions happen to be compatible.
Instantiation is triggered by a coincidence of module name and signature name in the dependencies of some compilation unit (executable, library, test suite...) When the names coincide, a process called "signature matching" takes place that verifies that the types and definitions are compatible.
The "happy path" is that in your program you depend on some library having a signature "hole", and also on another library that provides an implementation module with the same name. Then signature matching happens automatically. When the names don't match, or we need multiple instantiations of the signature-using library, we have to rename signatures and/or modules in the mixins section of the Cabal file.
As for why module signatures might be useful, consider bytestring, the most popular library by far for handling binary data in Haskell. But there are others, for example stdio with its Bytes type.
Suppose you are writing your own library that uses binary data, and you don't want to force your users into either stdio or bytestring. What are your choices?
One would be to create something like a Bytelike class and parameterize all your functions with it. You would also need to add a type parameter to every data type that contains bytes.
Another would be to create a signature that defines an abstract binary data type and all the operations that are required of it. Your library would make use of the signature, and remain "indefinite" until the user depends both on your library and a suitable implementation when creating his own libraries.
From the perspective of the user, the typeclass solution is unsatisfactory. The user knows that he wants to use either ByteString or Bytes, just one of them. The decision will not depend on some runtime flag and will remain constant across the extent of his program. And yet he has to deal with a more complex API that reminds him of that already decided issue at every turn.
It's better if he makes the decision once, writes it in his .cabal file, and deals with a simpler API from then onwards.
As described here, they're quite closely related to OCaml module signatures. They allow you to create a package missing some modules and say these modules, containing such and such types and values, should be delivered by package's user. I haven't tested it myself, but I imagine that such a package works very much like an OCaml functor.

Benefit of importing specific parts of a Haskell module

Except from potential name clashes -- which can be got around by other means -- is there any benefit to importing only the parts from a module that you need:
import SomeModule (x, y, z)
...verses just importing all of it, which is terser and easier to maintain:
import SomeModule
Would it make the binary smaller, for instance?
Name clashes and binary size optimization are just two of the benefits you can get. Indeed, it is a good practice to always identify what you want to get from the outside world of your code. So, whenever people look at your code they will know what exactly your code requesting.
This also gives you a very good chance to creat mocking solutions for test, since you can work through the list of imports and write mockings for them.
Unfortunately, in Haskell the type class instances are not that easy. They are imported implicitly and so can creates conflicts, also they may makes mocking harder, since there is no way to specify specific class instances only. Hopefully this can be fixed in future versions of Haskell.
UPDATE
The benifits I listed above (code maintenance and test mocking) are not limited to Haskell. Actually, it is also common practice in Java, as I know. In Java you can just import a single class, or even a single static variable/method. Unfortunately again, you still cannot selectively import member functions.
No, it's only for the purpose of preventing name clashes. The other mechanism for preventing name clashes - namely import qualified - results in more verbose (less readable) code.
It wouldn't make the binary smaller - consider that functions in a given module all reference each other, usually, so they need to be compiled together.

How, why and when to use the ".Internal" modules pattern?

I've seen a couple of package on hackage which contain module names with .Internal as their last name component (e.g. Data.ByteString.Internal)
Those modules are usually not properly browsable (but they may show up nevertheless) in Haddock and should not be used by client code, but contain definitions which are either re-exported from exposed modules or just used internally.
Now my question(s) to this library organization pattern are:
What problem(s) do those .Internal modules solve?
Are there other preferable ways to workaround those problems?
Which definitions should be moved to those .Internal modules?
What's the current recommended practice with respect to organizing libraries with the help of such .Internal modules?
Internal modules are generally modules that expose the internals of a package, that break package encapsulation.
To take ByteString as an example: When you normally use ByteStrings, they are used as opaque data types; a ByteString value is atomic, and its representation is uninteresting. All of the functions in Data.ByteString take values of ByteString, and never raw Ptr CChars or something.
This is a good thing; it means that the ByteString authors managed to make the representation abstract enough that all the details about the ByteString can be hidden completely from the user. Such a design leads to encapsulation of functionality.
The Internal modules are for people that wish to work with the internals of an encapsulated concept, to widen the encapsulation.
For example, you might want to make a new BitString data type, and you want users to be able to convert a ByteString into a BitString without copying any memory. In order to do this, you can't use opaque ByteStrings, because that doesn't give you access to the memory that represents the ByteString. You need access to the raw memory pointer to the byte data. This is what the Internal module for ByteStrings provides.
You should then make your BitString data type encapsulated as well, thus widening the encapsulation without breaking it. You are then free to provide your own BitString.Internal module, exposing the innards of your data type, for users that might want to inspect its representation in turn.
If someone does not provide an Internal module (or similar), you can't gain access to the module's internal representation, and the user writing e.g. BitString is forced to (ab)use things like unsafeCoerce to cast memory pointers, and things get ugly.
The definitions that should be put in an Internal module are the actual data declarations for your data types:
module Bla.Internal where
data Bla = Blu Int | Bli String
-- ...
module Bla (Bla, makeBla) where -- ONLY export the Bla type, not the constructors
import Bla.Internal
makeBla :: String -> Bla -- Some function only dealing with the opaque type
makeBla = undefined
#dflemstr is right, but not explicit about the following point. Some authors put internals of a package in a .Internal module and then don't expose that module via cabal, thereby making it inaccessible to client code. This is a bad thing1.
Exposed .Internal modules help to communicate different levels of abstraction implemented by a module. The alternatives are:
Expose implementation details in the same module as the abstraction.
Hide implementation details by not exposing them in module exports or via cabal.
(1) makes the documentation confusing, and makes it hard for the user to tell the transition between his code respecting a module's abstraction and breaking it. This transition is important: it is analogous to removing a parameter to a function and replacing its occurrences with a constant, a loss of generality.
(2) makes the above transition impossible and hinders the reuse of code. We would like to make our code as abstract as possible, but (cf. Einstein) no more so, and the module author does not have as much information as the module user, so is not in a position to decide what code should be inaccessible. See the link for more on this argument, as it is somewhat peculiar and controversial.
Exposing .Internal modules provides a happy medium which communicates the abstraction barrier without enforcing it, allowing users to easily restrict themselves to abstract code, but allowing them to "beta expand" the module's use if the abstraction breaks down or is incomplete.
1 There are, of course, complications to this puristic judgement. An internal change can now break client code, and authors now have a larger obligation to stabilize their implementation as well as their interface. Even if it is properly disclaimed, users is users and gotsta be supported, so there is some appeal to hiding the internals. It begs for a custom version policy which differentiates between .Internal and interface changes, but fortunately this is consistent with (but not explicit in) the versioning policy. "Real code" is also notoriously lazy, so exposing an .Internal module can provide an easy out when there was an abstract way to define code that was just "harder" (but ultimately supports the community's reuse). It can also discourage reporting an omission in the abstract interface that really should be pushed to the author to fix.
The idea is that you can have the "proper", stabile API which you export from MyModule and this is the preferred and documented way to use the library.
In addition to the public API, your module probably has private data constructors and internal helper functions etc. The MyModule.Internal submodule can be used to export those internal functions instead of keeping them completely locked inside the module.
It lets the users of your libary to access the internals if they have needs that you didn't foresee, but with the understanding that they are accessing an internal API that doesn't have the same implicit guarantees as the public one.
It lets you access the internal functions and constructors for e.g. unit-testing purposes.
One extension (or possibly clarification) to what shang and dflemstr said: if you have internal definitions (data types whose constructors aren't exported, etc.) that you want to access from multiple modules which are exported, then you typically create such an .Internal module which isn't exposed at all (i.e. listed in Other-Modules in the .cabal file).
However, this sometimes does leak out when doing types in ghci (e.g. when using a function but where some of the types it refers to aren't in scope; can't think of an instance where this happens off the top of my head, but it does).

Importing modules as a function, with string as input

I want to make a function called 'load' which imports definitions of functions from another file. I know how to import modules, but in my program I want the definitions of the functions to change depending on which module is 'loaded' with this new function. Is there a way to do this? Is there a better way to write my program so that this is not necessary?
I think it's type signature would look something like:
load :: String -> IO ()
where the string is the name of the module to be loaded (and the module is in the same directory).
Edit: Thanks for all the replies. Most people agree that this is not the best way to do what I want. Instead, is there a way to declare a global variable from within an I/O program. That is, I want it so that if I type (function "thing") into a function of type String -> IO(), I can still type 'thing' into GHCi to get the value assigned to it... Any suggestions?
There is almost certainly a better way to write your program so that this is not necessary. It's hard to say what without knowing more details about your situation, though. You could, for instance, represent the generic interface each module implements as a data-type, and have each module export a value of that type with the implementation.
Basically, the set of loaded modules is a static, compile-time property, so it makes no sense to want your program's behaviour to change based on its contents. Are you trying to write a library? Your users probably won't appreciate it doing such evil magic to their import lists :) (And it probably isn't possible without Template Haskell in that case, anyway.)
The exception is if you're trying to implement a Haskell tool (e.g. REPL, IDE, etc.) or trying to do plugins; i.e. dynamically-loaded modules of Haskell source code to integrate into your Haskell program. The first thing to try for those should be hint, but you may find you need something more advanced; in that case, the GHC API is probably your best bet. plugins used to be the de-facto standard in this area, but it doesn't seem to compile with GHC 7; you might want to check out direct-plugins, a simplified implementation of a similar interface that does.
mueval might be relevant; it's designed for executing short (one-line) snippets of Haskell code in a safe sandbox, as used by lambdabot.
Unless you're building a Haskell IDE or something like that, you most likely don't need this (^1).
But, in the case you do, there is always the hint-package, which allows you to embed a haskell interpreter into your program. This allows you to both load haskell modules and to convert strings into haskell values at runtime. There is a nice example of how to use it here
^1: If you're looking for a way to make things polymorphic, i.e. changing some, but not all definitions of in your code, you're probably looking for typeclasses.
With regards to your edit, perhaps you might be interested in IORef.

Resources