I'm writing a numerical optimisation library in Haskell, with the aim of making functions like a gradient descent algorithm available for users of the library. In writing these relatively complex functions, I write intermediary functions, such as a function that performs just one step of gradient descent. Some of these intermediary functions perform tasks that no user of the library could ever have need for. Some are even quite cryptic, but make sense when used by a bigger function.
Is it common practice to leave these intermediary functions available to library users? I have considered moving these to an "Internal" library, but moving small functions into a whole different library from the main functions using them seems like a bad idea for code legibility. I'd also quite like to test these smaller functions as well as the main functions for debugging purposes down the line - and ideally would like to test both in the same place, so that complicates things even more.
I'm unsurprisingly using Cabal for the library so answers in that context as well would be helpful if that's easier.
You should definitely not just throw such internal functions in the export of your package's trunk module, together with the high-level ones. It makes the interface/haddocks hard to understand, and also poses problems if users come to depend on low-level details that may easily change in future releases.
So I would keep these functions in an “internal” module, which the “public” module imports but only re-exports those that are indended to be used:
Public
module Numeric.Hegash.Optimization (optimize) where
import Numeric.Hegash.Optimization.Internal
Private
module Numeric.Hegash.Optimization.Internal where
gradientDesc :: ...
gradientDesc = ...
optimize :: ...
optimize = ... gradientDesc ...
A more debatable matter is whether you should still allow users to load the Internal module, i.e. whether you should put it in the exposed-modules or other-modules section of your .cabal file. IMO it's best to err on the “exposed” side, because there could always be valid use cases that you didn't foresee. It also makes testing easier. Just ensure you clearly document that the module is unstable. Only functions that are so deeply in the implementation details that they are basically impossible to use outside of the module should not be exposed at all.
You can selectively export functions from a module by listing them in the header. For example, if you have functions gradient and gradient1 and only want to export the former, you can write:
module Gradient (gradient) where
You can also incorporate the intermediary functions into their parent functions using where to limit the scope to just the parent function. This will also prevent the inner function from being exported:
gradient ... =
...
where
gradient1 ... = ...
I know import qualified names has benefit of avoiding name conflicts. I'm asking purely from readability point of view.
Not familiar with haskell standard libraries, one thing I found annoying when reading haskell code (mostly from books and tutorials online) is that when I come across a function, I don't know if it belongs to a imported module or will be defined by user later.
Coming from a C++ background, it's usually seen as a good practice to call standard library function with namespace, for example std::find. Is it the same for haskell? If not then how do you overcome the problem that I mentioned above?
From Haskell style guide:
Always use explicit import lists or qualified imports for standard and
third party libraries. This makes the code more robust against changes
in these libraries. Exception: The Prelude.
So, the answer is yes. Using qualified import is considered a good practice for standard and third party libraries except the Prelude. But for infix function with symbols (something like <|*|>) you may want to import it explicitly as qualified import doesn't look nice on it.
I'm not too fond of qualified names, IMO they rather clutter the code. The only modules that should always be imported qualified are those that use names clashing with prelude functions – these normally have an explicit recommendation for doing so, in the documentation.
For widespreadly used modules such as Control.Applicative, there's not much reason not to import unqualified; most programmers should know all that's in there. For modules from less well-known packages that do something very specific, or to avoid clashes of a single name, you can use an explicit import list, e.g. import Data.List (sortBy), import System.Random.Shuffle (shuffleM) – this way, you don't have to litter your code with qualifiers, yet looking up an identifier in the imports section tells you immediately where it comes from (this is analogous to using std::cout;). But honestly, I find it even more convenient to just load the module into ghci and use
*ClunkyModule> :i strangeFunction
to see where it's defined.
There's one good point to be made about qualified imports or explicit import lists that I tend to neglect: they make your packages more future-proof. If a new version of some module stops exporting an item you need, or another module introduces a clashing name, then an explicit import will immediately point you to the problem.
I feel the same as you. If I see functionName in a module I'm unfamiliar with then I have no idea which one of the many imports it comes from. "Unfamiliar module" here can also mean one I myself wrote in the past! My current style is like the following, but it's by no means universally accepted. Users of this style are probably in a very small minority.
import qualified Long.Path.To.Module as M
... use M.functionName ...
or if I want more clarity
import qualified Long.Path.To.Module as Module
... use Module.functionName ...
Very rarely I will fully qualify
import qualified Long.Path.To.Module
... use Long.Path.To.Module.functionName ...
However, I almost never qualify infix operators.
My own set of rules.
1) Try not to import anything as qualified without renaming. B.ByteString is much more readable than Data.ByteString.ByteString. Exception can be made for truly ubiquitous modules, such as Control.Monad.
2) Don't import the whole module, import specific functions/types/classes, unless there are too many of them. That way if someone wants to find out where some function came from, he can just search the function's name in the beginning of the file.
3) Import closely related modules renaming them to the same name, unless imported functions conflict, or two of them are imported as a whole, without import list.
4) If possible, try to avoid using functions with the same name from different modules, even if these modules are renamed differently. If someone knows what function X.foo does, he is likely to be confused by the function Y.foo. If it's inevitable, consider creating a separate, very small, module that imports both functions and exports them under different names.
Except from potential name clashes -- which can be got around by other means -- is there any benefit to importing only the parts from a module that you need:
import SomeModule (x, y, z)
...verses just importing all of it, which is terser and easier to maintain:
import SomeModule
Would it make the binary smaller, for instance?
Name clashes and binary size optimization are just two of the benefits you can get. Indeed, it is a good practice to always identify what you want to get from the outside world of your code. So, whenever people look at your code they will know what exactly your code requesting.
This also gives you a very good chance to creat mocking solutions for test, since you can work through the list of imports and write mockings for them.
Unfortunately, in Haskell the type class instances are not that easy. They are imported implicitly and so can creates conflicts, also they may makes mocking harder, since there is no way to specify specific class instances only. Hopefully this can be fixed in future versions of Haskell.
UPDATE
The benifits I listed above (code maintenance and test mocking) are not limited to Haskell. Actually, it is also common practice in Java, as I know. In Java you can just import a single class, or even a single static variable/method. Unfortunately again, you still cannot selectively import member functions.
No, it's only for the purpose of preventing name clashes. The other mechanism for preventing name clashes - namely import qualified - results in more verbose (less readable) code.
It wouldn't make the binary smaller - consider that functions in a given module all reference each other, usually, so they need to be compiled together.
It seems that Template Haskell is often viewed by the Haskell community as an unfortunate convenience. It's hard to put into words exactly what I have observed in this regard, but consider these few examples
Template Haskell listed under "The Ugly (but necessary)" in response to the question Which Haskell (GHC) extensions should users use/avoid?
Template Haskell considered a temporary/inferior solution in Unboxed Vectors of newtype'd values thread (libraries mailing list)
Yesod is often criticized for relying too much on Template Haskell (see the blog post in response to this sentiment)
I've seen various blog posts where people do pretty neat stuff with Template Haskell, enabling prettier syntax that simply wouldn't be possible in regular Haskell, as well as tremendous boilerplate reduction. So why is it that Template Haskell is looked down upon in this way? What makes it undesirable? Under what circumstances should Template Haskell be avoided, and why?
One reason for avoiding Template Haskell is that it as a whole isn't type-safe, at all, thus going against much of "the spirit of Haskell." Here are some examples of this:
You have no control over what kind of Haskell AST a piece of TH code will generate, beyond where it will appear; you can have a value of type Exp, but you don't know if it is an expression that represents a [Char] or a (a -> (forall b . b -> c)) or whatever. TH would be more reliable if one could express that a function may only generate expressions of a certain type, or only function declarations, or only data-constructor-matching patterns, etc.
You can generate expressions that don't compile. You generated an expression that references a free variable foo that doesn't exist? Tough luck, you'll only see that when actually using your code generator, and only under the circumstances that trigger the generation of that particular code. It is very difficult to unit test, too.
TH is also outright dangerous:
Code that runs at compile-time can do arbitrary IO, including launching missiles or stealing your credit card. You don't want to have to look through every cabal package you ever download in search for TH exploits.
TH can access "module-private" functions and definitions, completely breaking encapsulation in some cases.
Then there are some problems that make TH functions less fun to use as a library developer:
TH code isn't always composable. Let's say someone makes a generator for lenses, and more often than not, that generator will be structured in such a way that it can only be called directly by the "end-user," and not by other TH code, by for example taking a list of type constructors to generate lenses for as the parameter. It is tricky to generate that list in code, while the user only has to write generateLenses [''Foo, ''Bar].
Developers don't even know that TH code can be composed. Did you know that you can write forM_ [''Foo, ''Bar] generateLens? Q is just a monad, so you can use all of the usual functions on it. Some people don't know this, and because of that, they create multiple overloaded versions of essentially the same functions with the same functionality, and these functions lead to a certain bloat effect. Also, most people write their generators in the Q monad even when they don't have to, which is like writing bla :: IO Int; bla = return 3; you are giving a function more "environment" than it needs, and clients of the function are required to provide that environment as an effect of that.
Finally, there are some things that make TH functions less fun to use as an end-user:
Opacity. When a TH function has type Q Dec, it can generate absolutely anything at the top-level of a module, and you have absolutely no control over what will be generated.
Monolithism. You can't control how much a TH function generates unless the developer allows it; if you find a function that generates a database interface and a JSON serialization interface, you can't say "No, I only want the database interface, thanks; I'll roll my own JSON interface"
Run time. TH code takes a relatively long time to run. The code is interpreted anew every time a file is compiled, and often, a ton of packages are required by the running TH code, that have to be loaded. This slows down compile time considerably.
This is solely my own opinion.
It's ugly to use. $(fooBar ''Asdf) just does not look nice. Superficial, sure, but it contributes.
It's even uglier to write. Quoting works sometimes, but a lot of the time you have to do manual AST grafting and plumbing. The API is big and unwieldy, there's always a lot of cases you don't care about but still need to dispatch, and the cases you do care about tend to be present in multiple similar but not identical forms (data vs. newtype, record-style vs. normal constructors, and so on). It's boring and repetitive to write and complicated enough to not be mechanical. The reform proposal addresses some of this (making quotes more widely applicable).
The stage restriction is hell. Not being able to splice functions defined in the same module is the smaller part of it: the other consequence is that if you have a top-level splice, everything after it in the module will be out of scope to anything before it. Other languages with this property (C, C++) make it workable by allowing you to forward declare things, but Haskell doesn't. If you need cyclic references between spliced declarations or their dependencies and dependents, you're usually just screwed.
It's undisciplined. What I mean by this is that most of the time when you express an abstraction, there is some kind of principle or concept behind that abstraction. For many abstractions, the principle behind them can be expressed in their types. For type classes, you can often formulate laws which instances should obey and clients can assume. If you use GHC's new generics feature to abstract the form of an instance declaration over any datatype (within bounds), you get to say "for sum types, it works like this, for product types, it works like that". Template Haskell, on the other hand, is just macros. It's not abstraction at the level of ideas, but abstraction at the level of ASTs, which is better, but only modestly, than abstraction at the level of plain text.*
It ties you to GHC. In theory another compiler could implement it, but in practice I doubt this will ever happen. (This is in contrast to various type system extensions which, though they might only be implemented by GHC at the moment, I could easily imagine being adopted by other compilers down the road and eventually standardized.)
The API isn't stable. When new language features are added to GHC and the template-haskell package is updated to support them, this often involves backwards-incompatible changes to the TH datatypes. If you want your TH code to be compatible with more than just one version of GHC you need to be very careful and possibly use CPP.
There's a general principle that you should use the right tool for the job and the smallest one that will suffice, and in that analogy Template Haskell is something like this. If there's a way to do it that's not Template Haskell, it's generally preferable.
The advantage of Template Haskell is that you can do things with it that you couldn't do any other way, and it's a big one. Most of the time the things TH is used for could otherwise only be done if they were implemented directly as compiler features. TH is extremely beneficial to have both because it lets you do these things, and because it lets you prototype potential compiler extensions in a much more lightweight and reusable way (see the various lens packages, for example).
To summarize why I think there are negative feelings towards Template Haskell: It solves a lot of problems, but for any given problem that it solves, it feels like there should be a better, more elegant, disciplined solution better suited to solving that problem, one which doesn't solve the problem by automatically generating the boilerplate, but by removing the need to have the boilerplate.
* Though I often feel that CPP has a better power-to-weight ratio for those problems that it can solve.
EDIT 23-04-14: What I was frequently trying to get at in the above, and have only recently gotten at exactly, is that there's an important distinction between abstraction and deduplication. Proper abstraction often results in deduplication as a side effect, and duplication is often a telltale sign of inadequate abstraction, but that's not why it's valuable. Proper abstraction is what makes code correct, comprehensible, and maintainable. Deduplication only makes it shorter. Template Haskell, like macros in general, is a tool for deduplication.
I'd like to address a few of the points dflemstr brings up.
I don't find the fact that you can't typecheck TH to be that worrying. Why? Because even if there is an error, it will still be compile time. I'm not sure if this strengthens my argument, but this is similar in spirit to the errors that you receive when using templates in C++. I think these errors are more understandable than C++'s errors though, as you'll get a pretty printed version of the generated code.
If a TH expression / quasi-quoter does something that's so advanced that tricky corners can hide, then perhaps it's ill-advised?
I break this rule quite a bit with quasi-quoters I've been working on lately (using haskell-src-exts / meta) - https://github.com/mgsloan/quasi-extras/tree/master/examples . I know this introduces some bugs such as not being able to splice in the generalized list comprehensions. However, I think that there's a good chance that some of the ideas in http://hackage.haskell.org/trac/ghc/blog/Template%20Haskell%20Proposal will end up in the compiler. Until then, the libraries for parsing Haskell to TH trees are a nearly perfect approximation.
Regarding compilation speed / dependencies, we can use the "zeroth" package to inline the generated code. This is at least nice for the users of a given library, but we can't do much better for the case of editing the library. Can TH dependencies bloat generated binaries? I thought it left out everything that's not referenced by the compiled code.
The staging restriction / splitting of compilation steps of the Haskell module does suck.
RE Opacity: This is the same for any library function you call. You have no control over what Data.List.groupBy will do. You just have a reasonable "guarantee" / convention that the version numbers tell you something about the compatibility. It is somewhat of a different matter of change when.
This is where using zeroth pays off - you're already versioning the generated files - so you'll always know when the form of the generated code has changed. Looking at the diffs might be a bit gnarly, though, for large amounts of generated code, so that's one place where a better developer interface would be handy.
RE Monolithism: You can certainly post-process the results of a TH expression, using your own compile-time code. It wouldn't be very much code to filter on top-level declaration type / name. Heck, you could imagine writing a function that does this generically. For modifying / de-monolithisizing quasiquoters, you can pattern match on "QuasiQuoter" and extract out the transformations used, or make a new one in terms of the old.
This answer is in response to the issues brought up by illissius, point by point:
It's ugly to use. $(fooBar ''Asdf) just does not look nice. Superficial, sure, but it contributes.
I agree. I feel like $( ) was chosen to look like it was part of the language - using the familiar symbol pallet of Haskell. However, that's exactly what you /don't/ want in the symbols used for your macro splicing. They definitely blend in too much, and this cosmetic aspect is quite important. I like the look of {{ }} for splices, because they are quite visually distinct.
It's even uglier to write. Quoting works sometimes, but a lot of the time you have to do manual AST grafting and plumbing. The [API][1] is big and unwieldy, there's always a lot of cases you don't care about but still need to dispatch, and the cases you do care about tend to be present in multiple similar but not identical forms (data vs. newtype, record-style vs. normal constructors, and so on). It's boring and repetitive to write and complicated enough to not be mechanical. The [reform proposal][2] addresses some of this (making quotes more widely applicable).
I also agree with this, however, as some of the comments in "New Directions for TH" observe, the lack of good out-of-the-box AST quoting is not a critical flaw. In this WIP package, I seek to address these problems in library form: https://github.com/mgsloan/quasi-extras . So far I allow splicing in a few more places than usual and can pattern match on ASTs.
The stage restriction is hell. Not being able to splice functions defined in the same module is the smaller part of it: the other consequence is that if you have a top-level splice, everything after it in the module will be out of scope to anything before it. Other languages with this property (C, C++) make it workable by allowing you to forward declare things, but Haskell doesn't. If you need cyclic references between spliced declarations or their dependencies and dependents, you're usually just screwed.
I've run into the issue of cyclic TH definitions being impossible before... It's quite annoying. There is a solution, but it's ugly - wrap the things involved in the cyclic dependency in a TH expression that combines all of the generated declarations. One of these declarations generators could just be a quasi-quoter that accepts Haskell code.
It's unprincipled. What I mean by this is that most of the time when you express an abstraction, there is some kind of principle or concept behind that abstraction. For many abstractions, the principle behind them can be expressed in their types. When you define a type class, you can often formulate laws which instances should obey and clients can assume. If you use GHC's [new generics feature][3] to abstract the form of an instance declaration over any datatype (within bounds), you get to say "for sum types, it works like this, for product types, it works like that". But Template Haskell is just dumb macros. It's not abstraction at the level of ideas, but abstraction at the level of ASTs, which is better, but only modestly, than abstraction at the level of plain text.
It's only unprincipled if you do unprincipled things with it. The only difference is that with the compiler implemented mechanisms for abstraction, you have more confidence that the abstraction isn't leaky. Perhaps democratizing language design does sound a bit scary! Creators of TH libraries need to document well and clearly define the meaning and results of the tools they provide. A good example of principled TH is the derive package: http://hackage.haskell.org/package/derive - it uses a DSL such that the example of many of the derivations /specifies/ the actual derivation.
It ties you to GHC. In theory another compiler could implement it, but in practice I doubt this will ever happen. (This is in contrast to various type system extensions which, though they might only be implemented by GHC at the moment, I could easily imagine being adopted by other compilers down the road and eventually standardized.)
That's a pretty good point - the TH API is pretty big and clunky. Re-implementing it seems like it could be tough. However, there are only really only a few ways to slice the problem of representing Haskell ASTs. I imagine that copying the TH ADTs, and writing a converter to the internal AST representation would get you a good deal of the way there. This would be equivalent to the (not insignificant) effort of creating haskell-src-meta. It could also be simply re-implemented by pretty printing the TH AST and using the compiler's internal parser.
While I could be wrong, I don't see TH as being that complicated of a compiler extension, from an implementation perspective. This is actually one of the benefits of "keeping it simple" and not having the fundamental layer be some theoretically appealing, statically verifiable templating system.
The API isn't stable. When new language features are added to GHC and the template-haskell package is updated to support them, this often involves backwards-incompatible changes to the TH datatypes. If you want your TH code to be compatible with more than just one version of GHC you need to be very careful and possibly use CPP.
This is also a good point, but somewhat dramaticized. While there have been API additions lately, they haven't been extensively breakage inducing. Also, I think that with the superior AST quoting I mentioned earlier, the API that actually needs to be used can be very substantially reduced. If no construction / matching needs distinct functions, and are instead expressed as literals, then most of the API disappears. Moreover, the code you write would port more easily to AST representations for languages similar to Haskell.
In summary, I think that TH is a powerful, semi-neglected tool. Less hate could lead to a more lively eco-system of libraries, encouraging the implementation of more language feature prototypes. It's been observed that TH is an overpowered tool, that can let you /do/ almost anything. Anarchy! Well, it's my opinion that this power can allow you to overcome most of its limitations, and construct systems capable of quite principled meta-programming approaches. It's worth the usage of ugly hacks to simulate the "proper" implementation, as this way the design of the "proper" implementation will gradually become clear.
In my personal ideal version of nirvana, much of the language would actually move out of the compiler, into libraries of these variety. The fact that the features are implemented as libraries does not heavily influence their ability to faithfully abstract.
What's the typical Haskell answer to boilerplate code? Abstraction. What're our favorite abstractions? Functions and typeclasses!
Typeclasses let us define a set of methods, that can then be used in all manner of functions generic on that class. However, other than this, the only way classes help avoid boilerplate is by offering "default definitions". Now here is an example of an unprincipled feature!
Minimal binding sets are not declarable / compiler checkable. This could lead to inadvertent definitions that yield bottom due to mutual recursion.
Despite the great convenience and power this would yield, you cannot specify superclass defaults, due to orphan instances http://lukepalmer.wordpress.com/2009/01/25/a-world-without-orphans/ These would let us fix the numeric hierarchy gracefully!
Going after TH-like capabilities for method defaults led to http://www.haskell.org/haskellwiki/GHC.Generics . While this is cool stuff, my only experience debugging code using these generics was nigh-impossible, due to the size of the type induced for and ADT as complicated as an AST. https://github.com/mgsloan/th-extra/commit/d7784d95d396eb3abdb409a24360beb03731c88c
In other words, this went after the features provided by TH, but it had to lift an entire domain of the language, the construction language, into a type system representation. While I can see it working well for your common problem, for complex ones, it seems prone to yielding a pile of symbols far more terrifying than TH hackery.
TH gives you value-level compile-time computation of the output code, whereas generics forces you to lift the pattern matching / recursion part of the code into the type system. While this does restrict the user in a few fairly useful ways, I don't think the complexity is worth it.
I think that the rejection of TH and lisp-like metaprogramming led to the preference towards things like method-defaults instead of more flexible, macro-expansion like declarations of instances. The discipline of avoiding things that could lead to unforseen results is wise, however, we should not ignore that Haskell's capable type system allows for more reliable metaprogramming than in many other environments (by checking the generated code).
One rather pragmatic problem with Template Haskell is that it only works when GHC's bytecode interpreter is available, which is not the case on all architectures. So if your program uses Template Haskell or relies on libraries that use it, it will not run on machines with an ARM, MIPS, S390 or PowerPC CPU.
This is relevant in practice: git-annex is a tool written in Haskell that makes sense to run on machines worrying about storage, such machines often have non-i386-CPUs. Personally, I run git-annex on a NSLU 2 (32 MB of RAM, 266MHz CPU; did you know Haskell works fine on such hardware?) If it would use Template Haskell, this is not possible.
(The situation about GHC on ARM is improving these days a lot and I think 7.4.2 even works, but the point still stands).
Why is TH bad? For me, it comes down to this:
If you need to produce so much repetitive code that you find yourself trying to use TH to auto-generate it, you're doing it wrong!
Think about it. Half the appeal of Haskell is that its high-level design allows you to avoid huge amounts of useless boilerplate code that you have to write in other languages. If you need compile-time code generation, you're basically saying that either your language or your application design has failed you. And we programmers don't like to fail.
Sometimes, of course, it's necessary. But sometimes you can avoid needing TH by just being a bit more clever with your designs.
(The other thing is that TH is quite low-level. There's no grand high-level design; a lot of GHC's internal implementation details are exposed. And that makes the API prone to change...)
I want to make a function called 'load' which imports definitions of functions from another file. I know how to import modules, but in my program I want the definitions of the functions to change depending on which module is 'loaded' with this new function. Is there a way to do this? Is there a better way to write my program so that this is not necessary?
I think it's type signature would look something like:
load :: String -> IO ()
where the string is the name of the module to be loaded (and the module is in the same directory).
Edit: Thanks for all the replies. Most people agree that this is not the best way to do what I want. Instead, is there a way to declare a global variable from within an I/O program. That is, I want it so that if I type (function "thing") into a function of type String -> IO(), I can still type 'thing' into GHCi to get the value assigned to it... Any suggestions?
There is almost certainly a better way to write your program so that this is not necessary. It's hard to say what without knowing more details about your situation, though. You could, for instance, represent the generic interface each module implements as a data-type, and have each module export a value of that type with the implementation.
Basically, the set of loaded modules is a static, compile-time property, so it makes no sense to want your program's behaviour to change based on its contents. Are you trying to write a library? Your users probably won't appreciate it doing such evil magic to their import lists :) (And it probably isn't possible without Template Haskell in that case, anyway.)
The exception is if you're trying to implement a Haskell tool (e.g. REPL, IDE, etc.) or trying to do plugins; i.e. dynamically-loaded modules of Haskell source code to integrate into your Haskell program. The first thing to try for those should be hint, but you may find you need something more advanced; in that case, the GHC API is probably your best bet. plugins used to be the de-facto standard in this area, but it doesn't seem to compile with GHC 7; you might want to check out direct-plugins, a simplified implementation of a similar interface that does.
mueval might be relevant; it's designed for executing short (one-line) snippets of Haskell code in a safe sandbox, as used by lambdabot.
Unless you're building a Haskell IDE or something like that, you most likely don't need this (^1).
But, in the case you do, there is always the hint-package, which allows you to embed a haskell interpreter into your program. This allows you to both load haskell modules and to convert strings into haskell values at runtime. There is a nice example of how to use it here
^1: If you're looking for a way to make things polymorphic, i.e. changing some, but not all definitions of in your code, you're probably looking for typeclasses.
With regards to your edit, perhaps you might be interested in IORef.