Where do QuickCheck instances belong in a cabal package? - haskell

I have a cabal package that exports a type NBT which might be useful for other developers. I've gone through the trouble of defining an Arbitrary instance for my type, and it would be a shame to not offer it to other developers for testing their code that integrates my work.
However, I want to avoid situations where my instance might get in the way. Perhaps the other developer has a different idea for what the Arbitrary instance should be. Perhaps my package's dependency on a particular version of QuickCheck might interfere with or be unwanted in the dependencies of the client project.
My ideas, in no particular order, are:
Leave the Arbitrary instance next to the definition of the type, and let clients deal with shadowing the instance or overriding the QuickCheck version number.
Make the Arbitrary instance an orphan instance in a separate module within the same package, say Data.NBT.Arbitrary. The dependency on QuickCheck for the overall package remains.
Offer the Arbitrary instance in a totally separate package, so that it can be listed as a separate test dependency for client projects.
Conditionally include both the Arbitrary instance and the QuickCheck dependency in the main package, but only if a flag like -ftest is set.
I've seen combinations of all of these used in other libraries, but haven't found any consensus on which works best. I want to try and get it right before uploading to Hackage.

On the basis of not much specific experience, but a general desire for robustness, the guiding principle for package dependencies should perhaps be
From each according to their ability; to each according to their need.
It's good to keep the dependencies of a package to the minimum needed for its essential functionality. That suggests option 3 or option 4 to me. Of course, it's a pain to chop the package up so much. If options are capable of expressing the conditionality involved, then option 4 sounds like a sensible approach, based on using language effectively to say what you mean.
It would be really good if a consensus emerged about which one switch we need to flick to get the testing kit as well as the basic functionality.
It's also clear that there's room for refinement here. It's amazing that Cabal works as well as it does, but it could allow for more sophisticated notions of "package", perhaps after the manner of the SML module system. Translating dependencies into function types, we basically get to write
simplePackage :: (Dependency1, .., Dependencyn) -> Deliverable
but one could imagine more elaborate combinations of products and functions, like
fancyPackage :: BasicDependency -> (BasicDeliverable, HelpfulExtras -> Gravy)
Until then, pick the option that most accurately reflects the actual deal. And tell us about it, so we can build that consensus.

The problem comes down to: how likely is it that someone using your library will be wanting to run QuickCheck tests using your NBT type?
If it is likely, and the Arbitrary instance is detailed (and thus not likely to change for different people), it would probably be best to ship it with your package, especially if you're going to make sure you keep updating the package (as for using a flag or not, that comes down to a bit of personal preference). If the instance is relatively simple however (and thus more likely that people would want to customise it), then it might be an idea to just provide a sample instance in the documentation.
If the type is primarily internal in nature and not likely to be used by others wanting to run tests, then using a flag to conditionally bring in QuickCheck is probably the best way to go to avoid unnecessary dependencies (i.e. the test suite is there just so you can test the package).
I'm not a fan of having QuickCheck-only packages in general, though it might be useful in some situations.

Related

Is it common to have example values for compile-time checking and where should they go in the code?

I have a reasonably complex structure of data types and records, which is not so easy to make sense of by just looking at production code for someone not familiar with the codebase.
To make sense of it, I created this kind of dummy functions that have two advantages: 1. they're checked at compile-time and 2. they serve as some kind of documentation, showcasing an example of how the overall structure of data types and records mixes together:
-- benefit 1: a newcomer can quickly make sense of the type system
-- benefit 2: easier to keep track of how the types evolve because of compile-time checking
exampleValue1 :: ApiResponseContent
exampleValue1 = ApiOnlineResponseContent [
OnlineResultRow (EntityId 10) [Just (FvInt 1), Just (FvFloat 1.5), Nothing],
OnlineResultRow (EntityId 20) [Just (FvInt 2), Nothing, Just (FvBool True)]
]
The only thing that bothers me a bit is that they feel a bit awkward put within the production code, as they're clearly dead code. However they're not tests either, they're just compiled-time-checked examples of how values can be assembled together from complex nested types. Therefore, they clearly don't belong to production code, but they don't quite belong to tests either.
Is it common practice to have this kind of compile-time examples? And where should they be placed within the codebase?
Have you considered including the examples as part of Haddock documentation?
Otherwise, there's a school of thought that try to reframe tests as examples. You can find a brief mention of this in Gerard Meszaros's work on unit testing. Dan North has also repeatedly used similar language, but it can often be difficult to track down his many iterations of such ideas. He tends to 'think in public' - here's one such reflection:
"I call this development, using example-guided design"
I understand that the kind of example being asked about isn't quite like that. Still, I think that it better belongs with test code than with production code.
If you put the examples in the production code, you essentially make it part of the API of your library (if you're shipping a library). This means that changing the examples would constitute a breaking change. That doesn't seem right to me.
In the BDD/DDD community, there's a lot of emphasis on tests as examples, also in the sense that automated tests serve as documentation. If Haddock documentation isn't an option, I'd consider putting the examples in the test code, as documentation. I sometimes do that by simply putting such 'vacuous tests' in a file called examples, perhaps with a little comment at the top to explain that the code in that file assists learning rather than verify behaviour.
It dodges the risk of introducing redundant breaking changes in the production code, and seems conceptually like a better fit.
I disagree that these aren't tests. Even “does this example compile” could be seen as a test, but probably you could also use them to actually test some functions, while you're at it.
So, put these definitions in your test suite, and use them for unit testing the functions that will actually be dealing with such values.
The main convention I’ve seen for housing such examples is to include ….Tutorial modules, such as Dhall.Tutorial and Clash.Tutorial, or parallel …-tutorial packages such as lens-tutorial.
These contain Haddock documentation, organised so as to read in linear order, alongside example definitions like yours.
With good examples, I find this convention very helpful for understanding and experimenting with a new package in addition to types and reference docs. It’s also fairly discoverable when browsing packages on Hackage, especially if you ensure that the reference and README link to the tutorial for longer explanations.
These modules can be just compiled as basic validation, but having more machine validation for docs is incredibly valuable, so I think it’s even better to link them into the test suite (e.g. hspec) as examples for unit tests (HUnit) or property tests (QuickCheck/hedgehog), or organise them as documentation tests (doctest).
In addition to GHC’s code coverage tools, weeder can help identify examples that aren’t being tested.

Languages with a NodeJS/CommonJS style module system

I really like the way NodeJS (and it's browser-side counterparts) handle modules:
var $ = require('jquery');
var config = require('./config.json');
module.exports = function(){};
module.exports = {...}
I am actually rather disappointed by the ES2015 'import' spec which is very similar to the majority of languages.
Out of curiosity, I decided to look for other languages which implement or even support a similar export/import style, but to no avail.
Perhaps I'm missing something, or more likely, my Google Foo isn't up to scratch, but it would be really interesting to see which other languages work in a similar way.
Has anyone come across similar systems?
Or maybe someone can even provide reasons that it isn't used all that often.
It is nearly impossible to properly compare these features. One can only compare their implementation in specific languages. I collected my experience mostly with the language Java and nodejs.
I observed these differences:
You can use require for more than just making other modules available to your module. For example, you can use it to parse a JSON file.
You can use require everywhere in your code, while import is only available at the top of a file.
require actually executes the required module (if it was not yet executed), while import has a more declarative nature. This might not be true for all languages, but it is a tendency.
require can load private dependencies from sub directories, while import often uses one global namespace for all the code. Again, this is also not true in general, but merely a tendency.
Responsibilities
As you can see, the require method has multiple responsibilities: declaring module dependencies and reading data. This is better separated with the import approach, since import is supposed to only handle module dependencies. I guess, what you like about being able to use the require method for reading JSON is, that it provides a really easy interface to the programmer. I agree that it is nice to have this kind of easy JSON reading interface, however there is no need to mix it with the module dependency mechanism. There can just be another method, for example readJson(). This would separate the concerns, so the require method would only be needed for declaring module dependencies.
Location in the Code
Now, that we only use require for module dependencies, it is a bad practice to use it anywhere else than at the top of your module. It just makes it hard to see the module dependencies when you use it everywhere in your code. This is why you can use the import statement only on top of your code.
I don't see the point where import creates a global variable. It merely creates a consistent identifier for each dependency, which is limited to the current file. As I said above, I recommend doing the same with the require method by using it only at the top of the file. It really helps to increase the readability of the code.
How it works
Executing code when loading a module can also be a problem, especially in big programs. You might run into a loop where one module transitively requires itself. This can be really hard to resolve. To my knowledge, nodejs handles this situation like so: When A requires B and B requires A and you start by requiring A, then:
the module system remembers that it currently loads A
it executes the code in A
it remembers that is currently loads B
it executes the code in B
it tries to load A, but A is already loading
A is not yet finished loading
it returns the half loaded A to B
B does not expect A to be half loaded
This might be a problem. Now, one can argue that cyclic dependencies should really be avoided and I agree with this. However, cyclic dependencies should only be avoided between separate components of a program. Classes in a component often have cyclic dependencies. Now, the module system can be used for both abstraction layers: Classes and Components. This might be an issue.
Next, the require approach often leads to singleton modules, which cannot be used multiple times in the same program, because they store global state. However, this is not really the fault of the system but the programmers fault how uses the system in the wrong way. Still, my observation is that the require approach misleads especially new programmers to do this.
Dependency Management
The dependency management that underlays the different approaches is indeed an interesting point. For example Java still misses a proper module system in the current version. Again, it is announced for the next version, but who knows whether this will ever become true. Currently, you can only get modules using OSGi, which is far from easy to use.
The dependency management underlaying nodejs is very powerful. However, it is also not perfect. For example non-private dependencies, which are dependencies that are exposed via the modules API, are always a problem. However, this is a common problem for dependency management so it is not limited to nodejs.
Conclusion
I guess both are not that bad, since each is used successfully. However, in my opinion, import has some objective advantages over require, like the separation of responsibilities. It follows that import can be restricted to the top of the code, which means there is only one place to search for module dependencies. Also, import might be a better fit for compiled languages, since these do not need to execute code to load code.

Create orphan instances or add spurious dependencies?

I'm working on updating my ReadArgs package. I had a request to add Arguable instances for Data.Text and FileSystem.Path.FilePath. The former is no big deal, since it's in the base package, but the latter requires system-filepath
So I could release a ReadArgs-ext package, chock full of orphan instances, or I could update the ReadArgs package with an additional external dependency. Which option makes more sense?
My usual rule of thumb is to tend towards adding the instances for packages that are in the Haskell Platform, but don't involve less portable elements such as graphics. This covers both filepath and text. Since you are already dealing with the outside world for command line arguments, neither one of those seems like a particularly egregious addition.
Orphans can lead to pretty terrible problems.
I don't use them in 95% of my packages, and I go out of my way to avoid packages that use them.
The two exceptions I have at this point are a few missing monoids in reducers and a package full of vector-instances I picked up because I wasn't willing to make my entire hierarchy of packages depend on vector, downgrading everything from Safe to Trustworthy.
I find when I'm tempted to add an orphan instance, I can usually work around it by providing some kind of WrappedMonad-like newtype wrapper for lifting or lowering another class.

Haskell module naming conventions

How should I name my Haskell modules for a program, not a library, and organize them in a hierarchy?
I'm making a ray tracer called Luminosity. First I had these modules:
Vector Colour Intersect Trace Render Parse Export
Each module was fine on it's own, but I felt like this lacked organization.
First, I put every module under Luminosity, so for example Vector was now Luminosity.Vector (I assume this is standard for a haskell program?).
Then I thought: Vector and Colour are independent and could be reused, so they should be separated. But they're way too small to turn into libraries.
Where should they go? There is already (on hackage) a Data.Vector and Data.Colour, so should I put them there? Or will that cause confusion (even if I import them grouped with my other local imports)? If not there, should it be Luminosity.Data.Vector or Data.Luminosity.Vector? I'm pretty sure I've seen both used, although maybe I just happened to look at a project using a nonconventional structure.
I also have a simple TGA image exporter (Export) which can be independent from Luminosity. It appears the correct location would be Codec.Image.TGA, but again, should Luminosity be in there somewhere and if so, where?
It would be nice if Structure of a Haskell project or some other wiki explained this.
Unless your program is really big, don't organize the modules in a hierarchy. Why not? Because although computers are good at hierarchy, people aren't. People are good at meaningful names. If you choose good names you can easily handle 150 modules in a flat name space.
I felt like [a flat name space] lacked organization.
Hierarchical organization is not an end in itself. To justify splitting modules up into a hierarchy, you need a reason. Good reasons tend to have to do with information hiding or reuse. When you bring in information hiding, you are halfway to a library design, and when you are talking about reuse, you are effectively building a library. To morph a big program into "smaller program plus library" is a good strategy for software evolution, but it looks like you're just starting, and your program isn't yet big enough to evolve that way.
These issues are largely independent of the programming language you are using. I recommend reading some of David Parnas's work on product lines and program families, and also Matthias Blume's underappreciated paper Hierarchical Modularity. These works will give you some more concrete ideas about when hierarchy starts to serve a purpose.
First of all I put every module under Luminosity
I think this was a good move. It clarifies to anyone that is reading the code that these modules were made specifically for the Luminosity project.
If you write a module with the intent of simulating or improving upon an existing library, or of filling a gap where you believe a particular generic library is missing, then in that rare case, drop the prefix and name it generically. For an example of this, see how the pipes package exports Control.Monad.Trans.Free, because the author was, for whatever reason, not satisfied with existing implementations of Free monads.
Then I thought, Vector and Colour are pretty much independent and could be reused, so they should be separated. But they're way to small to separate off into a library (125 and 42 lines respectively). Where should they go?
If you don't make a separate library, then probably leave them at Luminosity.Vector and Luminosity.Colour. If you do make separate libraries, then try emailing the target audience of those libraries and see how other people think these libraries should be named and categorized. Whether or not you split these out into separate libraries is entirely up to you and how much benefit you think these separate libraries might provide for other people.

How are the Haddock module fields Portability, Stability and Maintainer used?

In lots of Haddock-generated module documentation (e.g. Prelude), a small box in the top-right can be seen, containing portability, stability and maintainer information:
From looking at the source code to such modules and experimentation, I confirmed that this information is generated from lines like the following in the module description:
-- Maintainer : libraries#haskell.org
-- Stability : stable
-- Portability : portable
There are several strange things about this:
The fields only seem to work in this order — any fields put out of order are simply treat as part of the module description itself. This is despite the fact that the order in the source file is the opposite of the order in the generated documentation!
I have been unable to find any official documentation of these fields. There is a Cabal package property named stability, the example values of which match the values I've seen in the equivalent Haddock fields, but beyond that, I've found nothing.
So: How are these fields intended to be used, and are they documented anywhere?
In particular, I'd like to know:
The full list of commonly-used values for Portability and Stability. This HaskellWiki page has a list, but I'd like to know where this list originated from.
The criteria for deciding whether a module is portable or non-portable. In particular, the package I would like the answers to these questions for, acme-strfry, is an FFI binding to strfry, a function only available in glibc. Is the package non-portable, because it only works on glibc systems, or portable, because it does not use any Haskell language extensions? The common usage seems to imply the latter.
Why a specific order of fields is required in the source file, and why it's the opposite of the ordering in the generated documentation.
Oh, I thought those fields were from the cabal package description. They don't seem to be documented at all on Haddock's docs. I've found this, which doesn't really answer your question but:
http://trac.haskell.org/haddock/ticket/71
So if it's freeform anyway, why not just write "non-portable (depends on glibc)"? I've seen even "portable (depends on ghc)", which is odd. I also wonder what happens with modules that were non-portable due to non-Haskell98 extension Foo, after Foo was added to Haskell 2010.
Note that the Cabal documenation you link to also says stability is freeform. Of course, even if haddock or cabal were to define what are the acceptable values, it'd still be up to the maintainer to subjectively select one.
About the specific order, you should probably just ask at the haddock mailing list, or check the source and file a bug.
PS: strfry is an invaluable contribution to the Haskell community, but it should be pure and portable, don't you think?
Ah yes, one of the more obscure and crufty features of Haddock.
As best as I can tell, it's just an undocumented hack. There's no sane reason why the order of the fields should matter, but it does. The specific choice of formatting (i.e., as a special form inside the module comment rather than as a separate block of some kind) isn't the best either. My guess is that somebody wanted to quickly add this feature one day, so they hacked up something minimal but functioning, and left it at that. (Without bothering to document it.)
Personally, I just don't bother with these fields at all. The information is available from Cabal, so I don't bother duplicating it in Haddock as well. Perhaps some day Cabal will pass this information to Haddock automatically...

Resources