Where do I define Arbitrary instances?

Where do I define Arbitrary instances? - haskell

I can't figure out where to define Arbitrary instances for my datatype. If I put it in the package, then the package unnecessarily has to have QuickCheck as a dependency. If I put it in the tests, then other packages can't make use of the instance. If I put it in a separate test-utils package then the tests also have to live in a separate package so it's an orphan instance and also stack test --coverage doesn't work.
What other options are there?

I'd usually pick the separate package option — but then I don't use stack test --coverage. Thanks for introducing me to it!
(Edit: I'd probably do this, and then use the test flag option only for running stack test --coverage --flag thepackage:arbitrary so that nobody else has to deal with the flags.)
It may also be worth raising the --coverage issue on the stack issue tracker, as it would be good for the coverage check to work in this case.
You ask for other options — the best one is probably a test flag.
A test flag
It is possible to define a flag in your cabal file (defaulting to false) which will only build the modules with your QuickCheck dependency if the flag is selected.
Place the required code in the directory arbitrary (for example). Then add the equivalent of the following to the relevant parts of your package.yaml (1st snippet) or the-library.cabal (2nd snippet) file:
flags:
arbitrary:
description: Compile with arbitrary instances
default: false
manual: true
library:
⁝
when:
- condition: flag(arbitrary)
dependencies:
- QuickCheck
source-dirs:
- arbitrary
flag arbitrary
description: Compile with arbitrary instances
manual: True
default: False
library
⁝
if flag(arbitrary)
hs-source-dirs:
arbitrary
build-depends:
QuickCheck
Then, packages which want to use the instances should add the following in their stack.yaml (1st) or cabal.project (2nd) files:
flag:
the-library:
arbitrary: true
constraints: the-library +arbitrary
But there's a slight problem… there is currently no way for that library to only depend on the +arbitrary version in only its test suite, unless it also defines such a flag. This may be a price worth paying.
Note: I haven't tested the downstream packaging, yet.
Ivan Milenovic's blog was useful as an initial resource.
DerivingVia/Generic instances
There may be another possibility, now that GHC 8.6 has been released, with DerivingVia. There's a case study in Blöndal, Löh & Scott (2018) for Arbitrary instances.
You would create newtype wrappers, and implement Arbitrary for those newtypes.
It doesn't quite avoid the problem, as is. But you may be able to implement Generic for those newtypes in such a way that the instances derivable using generic-arbitrary match what you'd want.
There may be some other options. In particular, QuickCheck's dependencies aren't actually that heavy. And there are other testing libraries, too. Plus, note that there has been some discussion of separating an Arbitrary-like typeclass into a standalone library.
I would also have suggested internal libraries, but that doesn't allow other packages to use your instances.

Related

Severity of GHC multiple package version warning

GHC warns when a package depends on different instances of the same package via dependencies, e.g.:
Configuring tasty-hspec-1.1.5.1...
Warning:
This package indirectly depends on multiple versions of the same package.
This is very likely to cause a compile failure.
package hspec-core (hspec-core-2.5.5-H06vLnMfEeIEsZFdji6h0O) requires
clock-0.7.2-9qwmBbNbGzEOSffjlyarp
package tasty (tasty-1.1.0.3-I8Vu9v0lHj8Jlg3jpKXavp) requires
clock-0.7.2-Cf9UTsaN2AjEpBnoMpmgkU
Two things are unclear to me with respect to this warning:
If GHC warns, and the compile doesn't fail, is everything fine? That is, could subtly conflicting instances of the same package still cause bad behaviour? (I'm imagining something like a type (Int, Int) in the public interface, with both instances of the package switching the order of the fields.)
Is there a way to make GHC fail for this warning?

That's not GHC what's warning you about multiple package-versions. GHC just compiles the packages that were specified... which hardly anybody ever does by hand, but let Stack or Cabal do it for them. And in this case it's Cabal giving the warning message.
If different versions cause a problem, you will in practice almost always see it at compile-time. It's most often a missing-instance error, because you're e.g. trying to use the class Foo from pkg-1.0 with the type Bar from pkg-2.0. Direct version mismatch of a data type in public interfaces can also happen.
Theoretically, I think it would also be possible to have an error like (Int,Int) meaning two different things, which the compiler would not catch. However, this kind of change is asking for trouble anyways. Whenever the order of some data fields is not completely obvious and might change in the future, a data record should be used to make sure the compiler can catch it. (This is largely orthogonal to the different-versions-of-same-package issue.)
If you want to be safe from any version-mismatch issues, you can use Stack instead of Cabal. I think this is a good part of the reason why many Haskellers prefer Stack.

How do I determine reasonable package dependency bounds when releasing a Haskell library?

When releasing a library on Hackage, how can I determine reasonable bounds for my dependencies?
It's a very brief question - not sure what additional information I can provide.
It would also be helpful to know if this is handled differently depending on whether stack or cabal is used.
Essentially my question relates to the cabal constraints which currently is set as:
library
hs-source-dirs: src
default-language: Haskell2010
exposed-modules: Data.ByteUnits
build-depends: base >=4.9 && <4.10
, safe == 0.3.15
I don't think the == is a good idea.

This is a tricky question, since there are different opinions in the community on best practices, and there are trade-offs between ease of figuring out bounds and providing the most compatibility with versions of dependencies possible. As I see it, there are basically three approaches you can take:
Look at the version of the dependencies you're currently using, e.g. safe-0.3.15. Assume that the package is following the PVP and will not release a breaking change before version 0.4, and add this: safe >= 0.3.15 && < 0.4
The above is nice, but limits a lot of potentially valid build plans. You can spend time testing against other versions of the dependency. For example, if you test against about 0.2.12 and 0.4.3 and they both seem to work, you may want to expand to safe >= 0.2.12 && < 0.5.
NOTE: A common mistake that crops up is that in a future version of your package, you forget to check compatibility with older versions, and it turns out you're using a new featured introduced in say safe-0.4.1, making the old bounds invalid. There's unfortunately not much in the way of automated tooling to check for this.
Just forget all of it: no version bounds at all, and make it the responsibility of the consumer of the package to ensure compatibility in a build plan. This has the downside that it's possible to create invalid build plans, but the upside that your bounds won't eliminate potentially good ones. (This is basically a false positive vs false negative tradeoff.)
The Stackage project runs nightly builds that can often let you know when your package is broken by new versions of dependencies, and make it easier for users to consume your package by providing pre-built snapshots that are known to work. This especially helps with case (3), and a little bit with the loose lower bounds in (2).
You may also want to consider using a Travis configuration the tests against old Stackage snapshots, e.g. https://github.com/commercialhaskell/stack/blob/master/doc/travis-complex.yml

I assume you're aware of the Haskell Package Versioning Policy (PVP). This provides some guidance, both implicitly in the meaning it assigns to the first three components of the version ("A.B.C") plus some explicit advice on Cabal version ranges.
Roughly speaking, future versions with the same "A.B" will not have introduced any breaking changes (including introducing orphan instances that might change the behavior of other code), but might have added new bindings, types, etc. Provided you have used only qualified imports or explicit import lists:
import qualified Something as S
import Something (foo, bar)
you can safely write a dependency of the form:
something >= 1.2.0 && < 1.6
where the assumption would be that you've tested 1.2.0 through 1.5.6, say, and you're confident that it'll continue to run with all future 1.5.xs (non-breaking changes) but could conceivably break on a future 1.6.
If you have imported a package unqualified (which you might very well do if you are re-exporting a big chunk of its API), you'll want a variant of:
the-package >= 1.2.0 && < 1.5.4 -- tested up to 1.5.3 API
the-package >= 1.5.3 && < 1.5.4 -- actually, this API precisely
There is also a caveat (see the PVP) if you define an orphan instance.
Finally, when importing some simple, stable packages where you've imported only the most obviously stable components, you could probably make the assumption that:
the-package >= 1.2.0 && < 2
is going to be pretty safe.
Looking at the Cabal file for a big, complex, well-written package might give you some sense of what's done in practice. The lens package, for example, extensively uses dependencies of the form:
array >= 0.3.0.2 && < 0.6
but has occasional dependencies like:
free >= 4 && < 6
(In many cases, these broader dependencies are on packages written by the same author, and he can obviously ensure that he doesn't break his own packages, so can be a little more lax.)

The purpose of the bounds is to ensure the version of the dependency you use has the feature(s) that you need. There is some earliest version X that introduces all those features, so you need a lower bound that is at least X. It's possible that a required feature is removed from a later version Y, in which case you would need to specify an upper bound that is less than Y:
build-depends: foo >= X && < Y
Ideally, a feature you need never gets removed, in which case you can drop the upper bound. This means the upper bound is only needed if you know your feature disappears from a later version. Otherwise, assume that foo >= X is sufficient until you have evidence to the contrary.
foo == X should rarely be used; it is basically short for foo >= X && <= X, and states that you are using a feature that is only in version X; it wasn't in earlier versions, and it was removed in a later version. If you find yourself in such a situation, it would probably be better to try to rewrite your code to not rely on that feature anymore, so that you can return to using foo >= Z (by relaxing the requirement for version X exactly, you may be able to get by with an even earlier version Z < X of foo).

A “foolproof” answer would be: allow exactly those versions that you're sure will work successfully! If you've only ever compiled your project with safe-0.3.15, then technically speaking you don't know whether it'll also work with safe-0.3.15, thus the constraint that cabal offers is right. If you want compatibility with other versions, test them by successively going backwards. This can be done easiest by completely disabling the constraint in the .cabal file and then doing
$ cabal configure --constraint='safe==XYZ' && cabal test
For each version XYZ = 0.3.14 etc..
Practically speaking, that's a bit of a paranoid approach. In particular, it's good etiquette for packages to follow the Package Versioning Policy, which demands that new minor versions should never break any builds. I.e., if 0.3.15 works, then 0.3.16 etc. should at any rate work too. So the conservative constraint if you've only checked 0.3.15 would actually be safe >=0.3.15 && <0.4. Probably, safe >=0.3 && <0.4 would be safe† too. The PVP also requires that you don't use looser major-version bounds than you can confirm to work, i.e. it mandates the <0.4 constraint.
Often, this is still needlessly strict. It depends on how tightly you work with some package. In particular, sometimes you'll need to explicitly depend on a package just for some extra configuration function of a type used by a more important dependency. In such a case, I tend to not give any bounds at all for the ancillary package. For an extreme example, if I depend on diagrams-lib, that there's no good reason to give any bounds to diagrams-core, because that is anyway coupled to diagrams-lib.
I also don't usually bother with bounds for very stable and standard packages like containers. The exception is of course base.
†Did you have to pick the safe package as an example?

Why can't I replace libraries distributed with GHC? What would happen if I did?

I see in this answer and this one that "everything will break horribly" and Stack won't let me replace base, but it will let me replace bytestring. What's the problem with this? Is there a way to do this safely without recompiling GHC? I'm debugging a problem with the base libraries and it'd be very convenient.
N.B. when I say I want to replace base I mean with a modified version of base from the same GHC version. I'm debugging the library, not testing a program against different GHC releases.

Most libraries are collections of Haskell modules containing Haskell code. The meaning of those libraries is determined by the code in the modules.
The base package, though, is a bit different. Many of the functions and data types it offers are not implemented in standard Haskell; their meaning is not given by the code contained in the package, but by the compiler itself. If you look at the source of the base package (and the other boot libraries), you will see many operations whose complete definition is simply undefined. Special code in the compiler's runtime system implements these operations and exposes them.
For example, if the compiler didn't offer seq as a primitive operation, there would be no way to implement seq after-the-fact: no Haskell term that you can write down will have the same type and semantics as seq unless it uses seq (or one of the Haskell extensions defined in terms of seq). Likewise many of the pointer operations, ST operations, concurrency primitives, and so forth are implemented in the compiler themselves.
Not only are these operations typically unimplementable, they also are typically very strongly tied to the compiler's internal data structures, which change from one release to the next. So even if you managed to convince GHC to use the base package from a different (version of the) compiler, the most likely outcome would simply be corrupted internal data structures with unpredictable (and potentially disastrous) results -- race conditions, trashing memory, space leaks, segfaults, that kind of thing.
If you need several versions of base, just install several versions of GHC. It's been carefully architected so that multiple versions can peacefully coexist on a single machine. (And in particular installing multiple versions definitely does not require recompiling GHC or even compiling GHC a first time, which seems to be your main concern.)

How to use 'make' with GHC Dependency Generation

I've got a couple of (independent) files that take quite a while to compile, so I thought I would try out parallel compilation, per Don Stewart's answer here.
I followed the directions here, so my makefile looks something like
quickbuild:
ghc --make MyProg.hs -o MyProg
depend:
ghc -M -dep-makefile makefile MyProg
# DO NOT DELETE: Beginning of Haskell dependencies
...
MyProg.o : MyProg.hs
MyProg.o : B.hi
MyProg.o : C.hi
...
# DO NOT DELETE: End of Haskell dependenciesghc
(Note: contrary to the docs, GHC seems to default to "Makefile" rather than "makefile", even when "makefile" exists.)
My question is: How do I make quickbuild depend on any of the auto-gen dependencies (so that make will actually run in parallel)? I tried adding 'MyProg.o' to the dependency list of 'quickbuild', but 'make' (rightly) complained that there was no rule to build 'B.hi'.

I suggest not to use make for this kind of purpose.
Look at ghc-parmake and its issues, especially this one - GHC has a very sophisticated recompilation checker that you cannot replicate with Makefiles (it can detect e.g. if a package file outside of your own project changes).
You will also not receive a large speedup (in practice not > 2) from a parallel make -j for running multiple GHCs in parallel, since firing multiple GHCs has high startup overhead which is avoided by ghc --make. In particular, each new GHC invocation has to parse and typecheck all the interface .hi files involved in all dependencies of the module you are compiling; ghc --make caches them.
Instead, use the new ghc --make -j of GHC 7.8 - it is truly parallel.
It will be more reliable and less effort than your manually written Makefile, and do recompilation avoidance better than Make can do with its file time stamps.
On the first view, this sounds like a drawback of Haskell, but in fact it is not. In other languages that like to use make for building, say C++, it is impossible to notice when files outside of your project change; having a build system in the compiler itself like ghc --make allows to notice this.

How small should I make make modules in Haskell?

I'm writing a snake game in Haskell. These are some of the things I have:
A Coord data type
A Line data type
A Rect data type
A Polygon type class, which allows me to get a Rect as a series of lines ([Line]).
An Impassable type class that allows me to get a Line as a series of Coords ([Coord]) so that I can detect collisions between other Impassables.
A Draw type class for anything that I want to draw to the screen (HSCurses).
Finally I'm using QuickCheck so I want to declare Arbitrary instances for a lot of these things.
Currently I have a lot of these in separate modules so I have lots of small modules. I've noticed that I have to import a lot of them for each other so I'm kind of wondering what the point was.
I'm particularly confused about Arbitrary instances. When using -Wall I get warnings about orphaned instances when I but those instances together in one test file, my understanding is that I can avoid that warning by putting those instances in the same module as where the data type is defined but then I'll need to import Test.QuickCheck for all those modules which seems silly because QuickCheck should only be required when building the test executable.
Any advice on the specific problem with QuickCheck would be appreciated as would guidance on the more general problem of how/where programs should be divided into modules.

You can have your cake and eat it too. You can re-export modules.
module Geometry
( module Coord, module Line, module Rect, module Polygon, module Impassable )
where
I usually use a module whenever I have a complete abstraction -- i.e. when a data type's meaning differs from its implementation. Knowing little about your code, I would probably group Polygon and Impassable together, perhaps making a Collision data type to represent what they return. But Coord, Line, and Rect seem like good abstractions and they probably deserve their own modules.

For testing purposes, I use separate modules for the Arbitrary instances. Although I generally avoid orphan instances, these modules only get built when building the test executable so I don't mind the orphans or that it's not -Wall clean. You can also use -fno-warn-orphans to disable just this warning message.

I generally put more emphasis on the module interface as defined by the functions it exposes rather than the data types it exposes. Do some of your types share a common set of functions? Then I would put them in the same module.
But my practise is probably not the best since I usually write small programs. I would advise looking at some code from Hackage to see what package maintainers do.
If there were a way to sort packages by community rating or number of downloads, that would be a good place to start. (I thought there was, but now that I look for it, I can't find it.) Failing that, look at packages that you already use.

One solution with QuickCheck is to use the C preprocessor to selectively enable the Arbitrary instances when you are testing. You put the Arbitrary instances straight into your main modules but wrap them with preprocessor macros, then put a "test" flag into your Cabal file.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string