Create orphan instances or add spurious dependencies? - haskell

I'm working on updating my ReadArgs package. I had a request to add Arguable instances for Data.Text and FileSystem.Path.FilePath. The former is no big deal, since it's in the base package, but the latter requires system-filepath
So I could release a ReadArgs-ext package, chock full of orphan instances, or I could update the ReadArgs package with an additional external dependency. Which option makes more sense?

My usual rule of thumb is to tend towards adding the instances for packages that are in the Haskell Platform, but don't involve less portable elements such as graphics. This covers both filepath and text. Since you are already dealing with the outside world for command line arguments, neither one of those seems like a particularly egregious addition.
Orphans can lead to pretty terrible problems.
I don't use them in 95% of my packages, and I go out of my way to avoid packages that use them.
The two exceptions I have at this point are a few missing monoids in reducers and a package full of vector-instances I picked up because I wasn't willing to make my entire hierarchy of packages depend on vector, downgrading everything from Safe to Trustworthy.
I find when I'm tempted to add an orphan instance, I can usually work around it by providing some kind of WrappedMonad-like newtype wrapper for lifting or lowering another class.

Related

What is the least obvious Go standard library package you can use to escape a sandbox?

I'm designing a project for a college-level computer security course, and I'm trying to include a vulnerability where code which is "clean" by virtue of a number of risky packages being blacklisted (unsafe, os, ioutil, etc). The question is this: can you think of a way to use other non-obvious Go standard library packages to escape the sandbox? "Escape the sandbox" here means reading/writing files, making network connections, breaking memory safety (which would allow you to do any of the other things), etc.
Things I've tried so far that haven't worked:
Using the reflect package to do unsafe pointer conversions (the reflect package seems really safe against this sort of abuse)
Using the reflect package to get access to a reference held by a random stdlib package to some sensitive function like os.Open (I haven't found any that actually keep function pointers or anything like that)

cabal hell with dependencies of ghc-baked in packages

I have the following instance of cabal hell:
(with ghc-7.8.3 built from source on x86_64 GNU/Linux,
and user-install: True in .cabal/config)
1) at some time, transformers-0.4.0.0 was installed (in user space, shadowing (?) transformers-0.3 from the global space)
2) later, several libraries pick transformers-0.4
3) then, I install hint, which depends on ghc, which depends on transformers-0.3, and which cannot be changed, since ghc is hard-wired.
result: I cannot use libraries from 2) and hint in one project.
As a work-around, I am putting constraint: transformers installed in .cabal/config, and rebuild. Is there a better way to handle this situation - or to avoid it in the first place?
Is there a better way to handle this situation.
No, your approach is sensible.
or to avoid it in the first place?
Tricky. Most people do not build stuff depending on ghc, so for them it makes sense to upgrade transformers etc. Therefore, your constraint is not a suitable default.
As Zeta writes: Sandboxes can help. If you had used sandboxes for your installations in (2), and used another sandbox for whatever tries to use both hint and (2), then it would simply build these dependencies dedicated for whatever you are building.
This comes at the expense of not sharing any space or build-time between the various things you are doing.

Why doesn't Haskell support mutually recursive modules?

Haskell supports mutually recursive let-bindings, which is great. Haskell doesn't support mutually recursive modules, which is sometimes terrible. I know that GHC has its .hs-boot mechanism, but I think that's a bit of a hack.
As far as I know, transparent support for mutually recursive modules should be relatively "simple", and it can be done exactly like mutually recursive let-bindings: instead of taking each separate module as a compilation unit, I would take every strongly connected component of the module dependency graph as a compilation unit.
Am I missing something here? Is there any non-trivial reason why Haskell doesn't support mutually recursive modules in this way?
This 6-year-old feature request ticket contains a fair amount of discussion, which you may have already seen. The gist of it is that it's not entirely a simple change as far as GHC is concerned. A few specific issues raised:
GHC currently has a lot of baked-in assumptions about how modules are processed during compilation, and changing those assumptions significantly would vastly outweigh the benefits of transparent support for mutually recursive modules.
Lumping groups of modules together means they have to be compiled together, which means more recompilation and awkwardness with generating separate .hi and .o files.
Backward compatibility with existing builds that use hs-boot files.
You have the potential for mutually-recursive bindings that cross module boundaries in a mutually-recursive module group, which raises issues with anything that involves implicit, module-level scope (such as defaulting, and possibly type class instances).
And of course, the potential for unknown, unanticipated bugs, as with anything that alters long-standing assumptions in GHC. Even without massive changes to the compilation process, many things are currently assumed to be compiled on a per-module basis.
A lot of people would like to see this supported, but so far nobody has either produced a possible implementation or worked out a detailed, well-specified design that handles all the fiddly corner cases of the sort mentioned above.

Haskell module naming conventions

How should I name my Haskell modules for a program, not a library, and organize them in a hierarchy?
I'm making a ray tracer called Luminosity. First I had these modules:
Vector Colour Intersect Trace Render Parse Export
Each module was fine on it's own, but I felt like this lacked organization.
First, I put every module under Luminosity, so for example Vector was now Luminosity.Vector (I assume this is standard for a haskell program?).
Then I thought: Vector and Colour are independent and could be reused, so they should be separated. But they're way too small to turn into libraries.
Where should they go? There is already (on hackage) a Data.Vector and Data.Colour, so should I put them there? Or will that cause confusion (even if I import them grouped with my other local imports)? If not there, should it be Luminosity.Data.Vector or Data.Luminosity.Vector? I'm pretty sure I've seen both used, although maybe I just happened to look at a project using a nonconventional structure.
I also have a simple TGA image exporter (Export) which can be independent from Luminosity. It appears the correct location would be Codec.Image.TGA, but again, should Luminosity be in there somewhere and if so, where?
It would be nice if Structure of a Haskell project or some other wiki explained this.
Unless your program is really big, don't organize the modules in a hierarchy. Why not? Because although computers are good at hierarchy, people aren't. People are good at meaningful names. If you choose good names you can easily handle 150 modules in a flat name space.
I felt like [a flat name space] lacked organization.
Hierarchical organization is not an end in itself. To justify splitting modules up into a hierarchy, you need a reason. Good reasons tend to have to do with information hiding or reuse. When you bring in information hiding, you are halfway to a library design, and when you are talking about reuse, you are effectively building a library. To morph a big program into "smaller program plus library" is a good strategy for software evolution, but it looks like you're just starting, and your program isn't yet big enough to evolve that way.
These issues are largely independent of the programming language you are using. I recommend reading some of David Parnas's work on product lines and program families, and also Matthias Blume's underappreciated paper Hierarchical Modularity. These works will give you some more concrete ideas about when hierarchy starts to serve a purpose.
First of all I put every module under Luminosity
I think this was a good move. It clarifies to anyone that is reading the code that these modules were made specifically for the Luminosity project.
If you write a module with the intent of simulating or improving upon an existing library, or of filling a gap where you believe a particular generic library is missing, then in that rare case, drop the prefix and name it generically. For an example of this, see how the pipes package exports Control.Monad.Trans.Free, because the author was, for whatever reason, not satisfied with existing implementations of Free monads.
Then I thought, Vector and Colour are pretty much independent and could be reused, so they should be separated. But they're way to small to separate off into a library (125 and 42 lines respectively). Where should they go?
If you don't make a separate library, then probably leave them at Luminosity.Vector and Luminosity.Colour. If you do make separate libraries, then try emailing the target audience of those libraries and see how other people think these libraries should be named and categorized. Whether or not you split these out into separate libraries is entirely up to you and how much benefit you think these separate libraries might provide for other people.

Where do QuickCheck instances belong in a cabal package?

I have a cabal package that exports a type NBT which might be useful for other developers. I've gone through the trouble of defining an Arbitrary instance for my type, and it would be a shame to not offer it to other developers for testing their code that integrates my work.
However, I want to avoid situations where my instance might get in the way. Perhaps the other developer has a different idea for what the Arbitrary instance should be. Perhaps my package's dependency on a particular version of QuickCheck might interfere with or be unwanted in the dependencies of the client project.
My ideas, in no particular order, are:
Leave the Arbitrary instance next to the definition of the type, and let clients deal with shadowing the instance or overriding the QuickCheck version number.
Make the Arbitrary instance an orphan instance in a separate module within the same package, say Data.NBT.Arbitrary. The dependency on QuickCheck for the overall package remains.
Offer the Arbitrary instance in a totally separate package, so that it can be listed as a separate test dependency for client projects.
Conditionally include both the Arbitrary instance and the QuickCheck dependency in the main package, but only if a flag like -ftest is set.
I've seen combinations of all of these used in other libraries, but haven't found any consensus on which works best. I want to try and get it right before uploading to Hackage.
On the basis of not much specific experience, but a general desire for robustness, the guiding principle for package dependencies should perhaps be
From each according to their ability; to each according to their need.
It's good to keep the dependencies of a package to the minimum needed for its essential functionality. That suggests option 3 or option 4 to me. Of course, it's a pain to chop the package up so much. If options are capable of expressing the conditionality involved, then option 4 sounds like a sensible approach, based on using language effectively to say what you mean.
It would be really good if a consensus emerged about which one switch we need to flick to get the testing kit as well as the basic functionality.
It's also clear that there's room for refinement here. It's amazing that Cabal works as well as it does, but it could allow for more sophisticated notions of "package", perhaps after the manner of the SML module system. Translating dependencies into function types, we basically get to write
simplePackage :: (Dependency1, .., Dependencyn) -> Deliverable
but one could imagine more elaborate combinations of products and functions, like
fancyPackage :: BasicDependency -> (BasicDeliverable, HelpfulExtras -> Gravy)
Until then, pick the option that most accurately reflects the actual deal. And tell us about it, so we can build that consensus.
The problem comes down to: how likely is it that someone using your library will be wanting to run QuickCheck tests using your NBT type?
If it is likely, and the Arbitrary instance is detailed (and thus not likely to change for different people), it would probably be best to ship it with your package, especially if you're going to make sure you keep updating the package (as for using a flag or not, that comes down to a bit of personal preference). If the instance is relatively simple however (and thus more likely that people would want to customise it), then it might be an idea to just provide a sample instance in the documentation.
If the type is primarily internal in nature and not likely to be used by others wanting to run tests, then using a flag to conditionally bring in QuickCheck is probably the best way to go to avoid unnecessary dependencies (i.e. the test suite is there just so you can test the package).
I'm not a fan of having QuickCheck-only packages in general, though it might be useful in some situations.

Resources