How do I determine reasonable package dependency bounds when releasing a Haskell library? - haskell

When releasing a library on Hackage, how can I determine reasonable bounds for my dependencies?
It's a very brief question - not sure what additional information I can provide.
It would also be helpful to know if this is handled differently depending on whether stack or cabal is used.
Essentially my question relates to the cabal constraints which currently is set as:
library
hs-source-dirs: src
default-language: Haskell2010
exposed-modules: Data.ByteUnits
build-depends: base >=4.9 && <4.10
, safe == 0.3.15
I don't think the == is a good idea.

This is a tricky question, since there are different opinions in the community on best practices, and there are trade-offs between ease of figuring out bounds and providing the most compatibility with versions of dependencies possible. As I see it, there are basically three approaches you can take:
Look at the version of the dependencies you're currently using, e.g. safe-0.3.15. Assume that the package is following the PVP and will not release a breaking change before version 0.4, and add this: safe >= 0.3.15 && < 0.4
The above is nice, but limits a lot of potentially valid build plans. You can spend time testing against other versions of the dependency. For example, if you test against about 0.2.12 and 0.4.3 and they both seem to work, you may want to expand to safe >= 0.2.12 && < 0.5.
NOTE: A common mistake that crops up is that in a future version of your package, you forget to check compatibility with older versions, and it turns out you're using a new featured introduced in say safe-0.4.1, making the old bounds invalid. There's unfortunately not much in the way of automated tooling to check for this.
Just forget all of it: no version bounds at all, and make it the responsibility of the consumer of the package to ensure compatibility in a build plan. This has the downside that it's possible to create invalid build plans, but the upside that your bounds won't eliminate potentially good ones. (This is basically a false positive vs false negative tradeoff.)
The Stackage project runs nightly builds that can often let you know when your package is broken by new versions of dependencies, and make it easier for users to consume your package by providing pre-built snapshots that are known to work. This especially helps with case (3), and a little bit with the loose lower bounds in (2).
You may also want to consider using a Travis configuration the tests against old Stackage snapshots, e.g. https://github.com/commercialhaskell/stack/blob/master/doc/travis-complex.yml

I assume you're aware of the Haskell Package Versioning Policy (PVP). This provides some guidance, both implicitly in the meaning it assigns to the first three components of the version ("A.B.C") plus some explicit advice on Cabal version ranges.
Roughly speaking, future versions with the same "A.B" will not have introduced any breaking changes (including introducing orphan instances that might change the behavior of other code), but might have added new bindings, types, etc. Provided you have used only qualified imports or explicit import lists:
import qualified Something as S
import Something (foo, bar)
you can safely write a dependency of the form:
something >= 1.2.0 && < 1.6
where the assumption would be that you've tested 1.2.0 through 1.5.6, say, and you're confident that it'll continue to run with all future 1.5.xs (non-breaking changes) but could conceivably break on a future 1.6.
If you have imported a package unqualified (which you might very well do if you are re-exporting a big chunk of its API), you'll want a variant of:
the-package >= 1.2.0 && < 1.5.4 -- tested up to 1.5.3 API
the-package >= 1.5.3 && < 1.5.4 -- actually, this API precisely
There is also a caveat (see the PVP) if you define an orphan instance.
Finally, when importing some simple, stable packages where you've imported only the most obviously stable components, you could probably make the assumption that:
the-package >= 1.2.0 && < 2
is going to be pretty safe.
Looking at the Cabal file for a big, complex, well-written package might give you some sense of what's done in practice. The lens package, for example, extensively uses dependencies of the form:
array >= 0.3.0.2 && < 0.6
but has occasional dependencies like:
free >= 4 && < 6
(In many cases, these broader dependencies are on packages written by the same author, and he can obviously ensure that he doesn't break his own packages, so can be a little more lax.)

The purpose of the bounds is to ensure the version of the dependency you use has the feature(s) that you need. There is some earliest version X that introduces all those features, so you need a lower bound that is at least X. It's possible that a required feature is removed from a later version Y, in which case you would need to specify an upper bound that is less than Y:
build-depends: foo >= X && < Y
Ideally, a feature you need never gets removed, in which case you can drop the upper bound. This means the upper bound is only needed if you know your feature disappears from a later version. Otherwise, assume that foo >= X is sufficient until you have evidence to the contrary.
foo == X should rarely be used; it is basically short for foo >= X && <= X, and states that you are using a feature that is only in version X; it wasn't in earlier versions, and it was removed in a later version. If you find yourself in such a situation, it would probably be better to try to rewrite your code to not rely on that feature anymore, so that you can return to using foo >= Z (by relaxing the requirement for version X exactly, you may be able to get by with an even earlier version Z < X of foo).

A “foolproof” answer would be: allow exactly those versions that you're sure will work successfully! If you've only ever compiled your project with safe-0.3.15, then technically speaking you don't know whether it'll also work with safe-0.3.15, thus the constraint that cabal offers is right. If you want compatibility with other versions, test them by successively going backwards. This can be done easiest by completely disabling the constraint in the .cabal file and then doing
$ cabal configure --constraint='safe==XYZ' && cabal test
For each version XYZ = 0.3.14 etc..
Practically speaking, that's a bit of a paranoid approach. In particular, it's good etiquette for packages to follow the Package Versioning Policy, which demands that new minor versions should never break any builds. I.e., if 0.3.15 works, then 0.3.16 etc. should at any rate work too. So the conservative constraint if you've only checked 0.3.15 would actually be safe >=0.3.15 && <0.4. Probably, safe >=0.3 && <0.4 would be safe† too. The PVP also requires that you don't use looser major-version bounds than you can confirm to work, i.e. it mandates the <0.4 constraint.
Often, this is still needlessly strict. It depends on how tightly you work with some package. In particular, sometimes you'll need to explicitly depend on a package just for some extra configuration function of a type used by a more important dependency. In such a case, I tend to not give any bounds at all for the ancillary package. For an extreme example, if I depend on diagrams-lib, that there's no good reason to give any bounds to diagrams-core, because that is anyway coupled to diagrams-lib.
I also don't usually bother with bounds for very stable and standard packages like containers. The exception is of course base.
†Did you have to pick the safe package as an example?

Related

SemVer: Do different results for the same seed warrant a major change?

Say I have written a piece of software (in R, for didactic purposes) which is following the Semantic Versioning Specification. This is the content of version 1.0.0 of the software:
funk <- function(x) {
jitter(x)
}
Which works so that
set.seed(1)
print(funk(0))
yields
[1] -0.009379653
Now suppose I change my function to this:
funk <- function(x) {
unrelated_random_stuff <- sample(1:10)
jitter(x)
}
And now, set.seed(1); print(funk(0)) yields
[1] -0.01176102
According to SemVer, does this constitute a major change? I.e., if I publish the software with these changes, should it be 2.0.0? I'm inclined to think so, since this technically changes results from scripts based on version 1.0.0, but I am not sure this qualifies as "breaking backwards compatibility" since we're talking about randomly-generated numbers.
If your customers are inclined to take a dependency on the output value, then yes, you probably want to bump the major version number. Even if this is library code, it's possible someone is using it for fuzz testing and it's critically important to yield reproducible result, in order to find track down and fix bugs, as well as ensure the fix does not regress in the future.

Why can't I replace libraries distributed with GHC? What would happen if I did?

I see in this answer and this one that "everything will break horribly" and Stack won't let me replace base, but it will let me replace bytestring. What's the problem with this? Is there a way to do this safely without recompiling GHC? I'm debugging a problem with the base libraries and it'd be very convenient.
N.B. when I say I want to replace base I mean with a modified version of base from the same GHC version. I'm debugging the library, not testing a program against different GHC releases.
Most libraries are collections of Haskell modules containing Haskell code. The meaning of those libraries is determined by the code in the modules.
The base package, though, is a bit different. Many of the functions and data types it offers are not implemented in standard Haskell; their meaning is not given by the code contained in the package, but by the compiler itself. If you look at the source of the base package (and the other boot libraries), you will see many operations whose complete definition is simply undefined. Special code in the compiler's runtime system implements these operations and exposes them.
For example, if the compiler didn't offer seq as a primitive operation, there would be no way to implement seq after-the-fact: no Haskell term that you can write down will have the same type and semantics as seq unless it uses seq (or one of the Haskell extensions defined in terms of seq). Likewise many of the pointer operations, ST operations, concurrency primitives, and so forth are implemented in the compiler themselves.
Not only are these operations typically unimplementable, they also are typically very strongly tied to the compiler's internal data structures, which change from one release to the next. So even if you managed to convince GHC to use the base package from a different (version of the) compiler, the most likely outcome would simply be corrupted internal data structures with unpredictable (and potentially disastrous) results -- race conditions, trashing memory, space leaks, segfaults, that kind of thing.
If you need several versions of base, just install several versions of GHC. It's been carefully architected so that multiple versions can peacefully coexist on a single machine. (And in particular installing multiple versions definitely does not require recompiling GHC or even compiling GHC a first time, which seems to be your main concern.)

How to pin dependencies in Haskell apps

I'm writing a todo.sh in Haskell now, to understand better how IO monads work, and I'm going to use cmdArgs to parse input, like argparse do in Python.
My question is, how can I pin the dependency of cmdArgs like pip's requirements.txt?
Django==1.5.1
South==0.7.6
And, is it ok distribute my package in Hackage?
Use the build-depends field in your .cabal file
build-depends:
cmdargs == 0.10.3
But specifying one exact version is usually not the best idea, so
build-depends:
cmdargs >= 0.8 && < 0.11
specifies a range of admissible versions.
And, is it ok distribute my package in Hackage?
Not if you know that it won't ever be useful to anyone.
In other words, yes, sure it is okay. You need an account on Hackage for that, and that may take some time to obtain, though.

Haskell Repa-0.2.0.1 with vector-0.9

First off, I love repa and repa-devil but most of my libraries require vector >= 0.9. since we are on GHC 7.0.* we need to use repa-0.2.0.* but these have a hard dependency on vector >= 0.7 && < 0.8. I was able to get repa-0.2.0.* to compile with vector-0.9 but am a bit concerned that there might be some problems lurking under the surface.
so, is it ok to relax the upper bound on the vector dependency in repa 0.2.0.1?
That should be okay if it compiles. But to prevent cabal-install from making difficulties, you should increase the version of your repa with relaxed dependencies. Pick an increase that's unlikely to become an official version number, e.g. append a .1 to the version. When installing new packages, cabal-install takes the dependencies from the global index, so if you have a version with official dependencies vector < 0.8, it will think that's broken and try to reinstall it, which won't work.

hackage package dependencies and future-proof libraries

In the dependencies section of a cabal file:
Build-Depends: base >= 3 && < 5, transformers >= 0.2.0
Should I be doing something like
Build-Depends: base >= 3 && < 5, transformers >= 0.2.0 && < 0.3.0
(putting upper limits on versions of packages I depend on)
or not?
I'll use a real example: my "List" package on Hackage (List monad transformer and class)
If I don't put the limit - my package could break by a change in "transformers"
If I do put the limit - a user that uses "transformers" but is using a newer version of it will not be able to use lift and liftIO with ListT because it's only an instance of these classes of transformers-0.2.x
I guess that applications should always put upper limits so that they never break, so this question is only about libraries:
Shall I use the upper version limit on dependencies or not?
There is an explicit policy recommending upper bounds - see in particular section 3 ("Dependencies in Cabal"). The other answers give some further justification for this policy.
In short - the upper limit should be in form of < A.(B+1) where A and B are the first elements of the current version (A.B.C...). This is because increasing A.B should mean that the version breaks old APIs.
Think about the failure modes:
With the upper bound, either your package builds or cabal bleats about an unsatisfied build dependency. Blame is clearly assigned.
Without the upper bound, customer has a recent version of transformers and it's not backwards compatible. Your software fails to build; GHC bleats about how your code doesn't compile. Your software looks shoddy.
Put in the upper bound.
IMO putting upper bounds on the accepted version numbers is the right thing to do. Given the semantics of version numbers used by Hackage there is certainly no guarantee that your package will work with, in this case, transformers 0.3.0.
I haven't seen any real discussion about this though and there doesn't seem to be a general recommendation to use upper bounds except for the base package.

Resources