How are the Haddock module fields Portability, Stability and Maintainer used? - haskell

In lots of Haddock-generated module documentation (e.g. Prelude), a small box in the top-right can be seen, containing portability, stability and maintainer information:
From looking at the source code to such modules and experimentation, I confirmed that this information is generated from lines like the following in the module description:
-- Maintainer : libraries#haskell.org
-- Stability : stable
-- Portability : portable
There are several strange things about this:
The fields only seem to work in this order — any fields put out of order are simply treat as part of the module description itself. This is despite the fact that the order in the source file is the opposite of the order in the generated documentation!
I have been unable to find any official documentation of these fields. There is a Cabal package property named stability, the example values of which match the values I've seen in the equivalent Haddock fields, but beyond that, I've found nothing.
So: How are these fields intended to be used, and are they documented anywhere?
In particular, I'd like to know:
The full list of commonly-used values for Portability and Stability. This HaskellWiki page has a list, but I'd like to know where this list originated from.
The criteria for deciding whether a module is portable or non-portable. In particular, the package I would like the answers to these questions for, acme-strfry, is an FFI binding to strfry, a function only available in glibc. Is the package non-portable, because it only works on glibc systems, or portable, because it does not use any Haskell language extensions? The common usage seems to imply the latter.
Why a specific order of fields is required in the source file, and why it's the opposite of the ordering in the generated documentation.

Oh, I thought those fields were from the cabal package description. They don't seem to be documented at all on Haddock's docs. I've found this, which doesn't really answer your question but:
http://trac.haskell.org/haddock/ticket/71
So if it's freeform anyway, why not just write "non-portable (depends on glibc)"? I've seen even "portable (depends on ghc)", which is odd. I also wonder what happens with modules that were non-portable due to non-Haskell98 extension Foo, after Foo was added to Haskell 2010.
Note that the Cabal documenation you link to also says stability is freeform. Of course, even if haddock or cabal were to define what are the acceptable values, it'd still be up to the maintainer to subjectively select one.
About the specific order, you should probably just ask at the haddock mailing list, or check the source and file a bug.
PS: strfry is an invaluable contribution to the Haskell community, but it should be pure and portable, don't you think?

Ah yes, one of the more obscure and crufty features of Haddock.
As best as I can tell, it's just an undocumented hack. There's no sane reason why the order of the fields should matter, but it does. The specific choice of formatting (i.e., as a special form inside the module comment rather than as a separate block of some kind) isn't the best either. My guess is that somebody wanted to quickly add this feature one day, so they hacked up something minimal but functioning, and left it at that. (Without bothering to document it.)
Personally, I just don't bother with these fields at all. The information is available from Cabal, so I don't bother duplicating it in Haddock as well. Perhaps some day Cabal will pass this information to Haddock automatically...

Related

Dead code and/or how to generate a cross reference from Haskell source

I've got some unused functionality in my codebase, but it's hard to identify. The code has evolved over the last year as I explore its problem space and possible solutions. What I'm needing to do is find that unused code so I can get rid of it. I'm happy if it deals with the problem on an exportable name basis.GHC has warnings that deal with non-exported unused code. Any tools specific to this task would be of interest.
However, I'm curious about a comprehensive cross referencing tool. I can find the unused code with such a tool. Years ago when I was working in C and assembler, I found that a good xref was a pretty handy tool, useful for many different purposes.
I'm getting nowhere with googling. Apparently in Haskell the dominant meaning of cross-reference is within literate programming. Though maybe something there would be useful.
I don’t know of such a tool, so in the past I have done a bit of a hack instead.
If you have a comprehensive test suite, you can run it with GHC’s code coverage tracing enabled. Compile with -fhpc and use hpc markup to generate annotated source. This gives you the union of unused code and untested code, both of which you would probably like to address anyway.
SourceGraph can give you a bunch of information which you may also find useful.
There is now a tool for this very purpose: https://hackage.haskell.org/package/weeder
It's been around since 2017, and while it has limitations, it definitely helps with large codebases.

Haskell, Hackage, GHC and productivity. How to solve a real example?

I don't know the best way to solve a simple (probably) problems (hackage related).
I asked for help about it (http://stackoverflow.com/questions/12841599/haskell-hackage-ghc-and-productivity-what-to-do) but I knew not explain well.
Today, I'm with a this kin problem.
The concrete problem isn't relevant, but is it:
`Write a function that, given a string, remove diacritics.`
Example:
`simpleWord "Cigüeñal" <-> "Ciguenal"
The correct way (I think) is to use the standard Unicode normalization. In some languages/frameworks (.Net, PHP, Python, ...) exist some related function.
In Haskell, thanks to hackage community exist too:
`Text.Unicode.Normalization.normalize`
But, I couldn't install with (eg) ghc-7.4 but compact-string (that depends of) fail.
A fix for compact-string exists (compact-string-fix) then: can't I use cabal to install (directly)?, should I download and patch it?, should I look for another alternative to function about?
I explained a concrete real case (simple or complex, don't care), the question (that I ask help for) is how can, a novice haskeller, know the best way to select correct libraries, ghc correct (balanced) version, without hit a wall.
I'm really lost about it.
Really, thank you very much for any suggestion.
Best regards.
The documentation for compact-string says, "This package is obsolete. Use text instead.".
The documentation for text says, "To use an extended and very rich family of functions for working with Unicode text (including normalization, regular expressions, non-standard encodings, text breaking, and locales), see the text-icu package.".
The documentation for text-icu shows that it successfully builds on GHC 7.4 and has support for Unicode normalization.
Here's the general process I follow when deciding which packages to use. First, I try to identify multiple packages that meet my needs. Then I look more closely at each package to try to determine which ones are the best for me, according to the criteria listed below.
It's usually better to use packages that are currently maintained. To determine if a package is currently maintained, I check the "Upload date" link on the package description page. (Of course, there are some old tried-and-true packages that haven't been modified in ages because they don't need modification.)
It's usually better to use packages that are mature, so I check the version number on the package description page. A package with a version number of 7.3.5 is probably more mature than a version 0.1 package.
It's usually better to use packages that are well documented. Sometimes there's a nice example of how to use the package in the Haddock documentation (yay!). I'll also check the "Home page" link on the package description page, because often there will be more documentation there.
It's usually better to use packages that are popular, because any problems will probably be addressed quickly, and other users can answer questions. I'll usually do a Google search and see whch packages are mentioned most often on Haskell mailing lists and StackOverflow.
It's usually better to use packages that don't require a lot of packages I don't already have, so I check the "Dependencies" section on the package description page.
I tend to follow this procedure when choosing a package for any programming language, not just Haskell.

Haskell module naming conventions

How should I name my Haskell modules for a program, not a library, and organize them in a hierarchy?
I'm making a ray tracer called Luminosity. First I had these modules:
Vector Colour Intersect Trace Render Parse Export
Each module was fine on it's own, but I felt like this lacked organization.
First, I put every module under Luminosity, so for example Vector was now Luminosity.Vector (I assume this is standard for a haskell program?).
Then I thought: Vector and Colour are independent and could be reused, so they should be separated. But they're way too small to turn into libraries.
Where should they go? There is already (on hackage) a Data.Vector and Data.Colour, so should I put them there? Or will that cause confusion (even if I import them grouped with my other local imports)? If not there, should it be Luminosity.Data.Vector or Data.Luminosity.Vector? I'm pretty sure I've seen both used, although maybe I just happened to look at a project using a nonconventional structure.
I also have a simple TGA image exporter (Export) which can be independent from Luminosity. It appears the correct location would be Codec.Image.TGA, but again, should Luminosity be in there somewhere and if so, where?
It would be nice if Structure of a Haskell project or some other wiki explained this.
Unless your program is really big, don't organize the modules in a hierarchy. Why not? Because although computers are good at hierarchy, people aren't. People are good at meaningful names. If you choose good names you can easily handle 150 modules in a flat name space.
I felt like [a flat name space] lacked organization.
Hierarchical organization is not an end in itself. To justify splitting modules up into a hierarchy, you need a reason. Good reasons tend to have to do with information hiding or reuse. When you bring in information hiding, you are halfway to a library design, and when you are talking about reuse, you are effectively building a library. To morph a big program into "smaller program plus library" is a good strategy for software evolution, but it looks like you're just starting, and your program isn't yet big enough to evolve that way.
These issues are largely independent of the programming language you are using. I recommend reading some of David Parnas's work on product lines and program families, and also Matthias Blume's underappreciated paper Hierarchical Modularity. These works will give you some more concrete ideas about when hierarchy starts to serve a purpose.
First of all I put every module under Luminosity
I think this was a good move. It clarifies to anyone that is reading the code that these modules were made specifically for the Luminosity project.
If you write a module with the intent of simulating or improving upon an existing library, or of filling a gap where you believe a particular generic library is missing, then in that rare case, drop the prefix and name it generically. For an example of this, see how the pipes package exports Control.Monad.Trans.Free, because the author was, for whatever reason, not satisfied with existing implementations of Free monads.
Then I thought, Vector and Colour are pretty much independent and could be reused, so they should be separated. But they're way to small to separate off into a library (125 and 42 lines respectively). Where should they go?
If you don't make a separate library, then probably leave them at Luminosity.Vector and Luminosity.Colour. If you do make separate libraries, then try emailing the target audience of those libraries and see how other people think these libraries should be named and categorized. Whether or not you split these out into separate libraries is entirely up to you and how much benefit you think these separate libraries might provide for other people.

options library like Google GFlags for Haskell

I'm interested in having something very similar to Google's flags library for Haskell.
Here is the small introduction to gflags that demonstrates why I love it: http://gflags.googlecode.com/svn/trunk/doc/gflags.html
I looked into the various getopt like libraries on Hackage and I haven't found one that matches the simplicity and flexibility of gflags.
Namely, I'd like to have these features:
generates --help (with default values mentioned in the help),
besides parsing the options given by the user, it should also err on unmatched options, so the user has a chance to note typos,
flags can be declared in any module easily (hopefully at the top-level, Template Haskell hackery acceptable if needed),
no need in main to call out to all the modules where I declared flags, instead the flags somehow register themselves at startup/linking/whatever time,
it's OK if main has to call a general initialization function, like in gflags' google::ParseCommandLineFlags(&argc, &argv, true);
flags can be used purely (yeah, I think this is an appropriate usage of unsafePerformIO to make the API more simple).
After looking around without success, I played with the idea of doing this myself (and of course sharing it on Hackage). However, I have absolutely no idea for the implementation of the registration part. I need something similar to GCC's ((constructor)) attribute or to C++'s static initialization, but in Haskell. Standard top-level unsafePerformIO is not enough, because that is lazy, so it won't be called before main starts to run.
After investigating all the solutions on Hackage (thanks for all the tips!), I went ahead with Don's typeclass implementation idea and created a library named HFlags.
It's on hackage: http://hackage.haskell.org/package/hflags
I also have a blog post, describing it: http://blog.risko.hu/2012/04/ann-hflags-0.html
You might like CmdArgs, though I haven't used it enough to tell if it satisfies all your constraints.

Where do QuickCheck instances belong in a cabal package?

I have a cabal package that exports a type NBT which might be useful for other developers. I've gone through the trouble of defining an Arbitrary instance for my type, and it would be a shame to not offer it to other developers for testing their code that integrates my work.
However, I want to avoid situations where my instance might get in the way. Perhaps the other developer has a different idea for what the Arbitrary instance should be. Perhaps my package's dependency on a particular version of QuickCheck might interfere with or be unwanted in the dependencies of the client project.
My ideas, in no particular order, are:
Leave the Arbitrary instance next to the definition of the type, and let clients deal with shadowing the instance or overriding the QuickCheck version number.
Make the Arbitrary instance an orphan instance in a separate module within the same package, say Data.NBT.Arbitrary. The dependency on QuickCheck for the overall package remains.
Offer the Arbitrary instance in a totally separate package, so that it can be listed as a separate test dependency for client projects.
Conditionally include both the Arbitrary instance and the QuickCheck dependency in the main package, but only if a flag like -ftest is set.
I've seen combinations of all of these used in other libraries, but haven't found any consensus on which works best. I want to try and get it right before uploading to Hackage.
On the basis of not much specific experience, but a general desire for robustness, the guiding principle for package dependencies should perhaps be
From each according to their ability; to each according to their need.
It's good to keep the dependencies of a package to the minimum needed for its essential functionality. That suggests option 3 or option 4 to me. Of course, it's a pain to chop the package up so much. If options are capable of expressing the conditionality involved, then option 4 sounds like a sensible approach, based on using language effectively to say what you mean.
It would be really good if a consensus emerged about which one switch we need to flick to get the testing kit as well as the basic functionality.
It's also clear that there's room for refinement here. It's amazing that Cabal works as well as it does, but it could allow for more sophisticated notions of "package", perhaps after the manner of the SML module system. Translating dependencies into function types, we basically get to write
simplePackage :: (Dependency1, .., Dependencyn) -> Deliverable
but one could imagine more elaborate combinations of products and functions, like
fancyPackage :: BasicDependency -> (BasicDeliverable, HelpfulExtras -> Gravy)
Until then, pick the option that most accurately reflects the actual deal. And tell us about it, so we can build that consensus.
The problem comes down to: how likely is it that someone using your library will be wanting to run QuickCheck tests using your NBT type?
If it is likely, and the Arbitrary instance is detailed (and thus not likely to change for different people), it would probably be best to ship it with your package, especially if you're going to make sure you keep updating the package (as for using a flag or not, that comes down to a bit of personal preference). If the instance is relatively simple however (and thus more likely that people would want to customise it), then it might be an idea to just provide a sample instance in the documentation.
If the type is primarily internal in nature and not likely to be used by others wanting to run tests, then using a flag to conditionally bring in QuickCheck is probably the best way to go to avoid unnecessary dependencies (i.e. the test suite is there just so you can test the package).
I'm not a fan of having QuickCheck-only packages in general, though it might be useful in some situations.

Resources