Best practices for sharing third party dependencies with your own dependencies

Best practices for sharing third party dependencies with your own dependencies - python-3.x

My project has a dependency on another project, and I'm using git dependency as follows in the setup.py file:
setup(
name="cake",
version="0.1",
install_requires=[
"flan # git+ssh://git#github.com/terrymcguire/flan.git#egg=flan"
]
)
Suppose they both depend on pyyaml. Is it best practice to include a "pyyaml==5.1.2" inside both projects' setup.py, install_requires: ... (or requirements.txt as you prefer), and make sure the versions are the same, or is it recommended to only have pyyaml listed as a dependency in the flan project, and then inherit the version in the parent project, even though it's then less clear that pyyaml is a dependency of the parent project, and if one day I no longer depend on flan, I might not notice I may have broken other code?

1.
Is it best practice to include a "pyyaml==5.1.2" inside both projects' setup.py, install_requires: ... (or requirements.txt as you prefer) [...]?
Only applications should (possibly) pin requirements to a specific version. Libraries should restrict to a range of known compatible versions (as accurate as possible).
In general I believe pinning the versions of dependencies in setup.py (or pyproject.toml) is a bad idea, since those can not be (easily) overruled by the end user, the one ultimately installing the projects (doesn't matter if applications or libraries) and the one who should have the last word on what gets installed. On the other hand it is good practice to give a recommendation of a combination of pinned versions of dependencies that is known to work well (because it has been tested) in the form of a requirements.txt file that the end user might opt to use or not (for applications, this doesn't make much sense for libraries).
Read for example Donald Stufft's article "setup.py vs requirements.txt".
2.
is it recommended to only have pyyaml listed as a dependency in the flan project, and then inherit the version in the parent project, even though it's then less clear that pyyaml is a dependency of the parent project [...]?
The general (obvious) rule is that all projects should list all of their own dependencies and only their own dependencies. Anything else doesn't make any sense (of course there might be exceptions as always).

Related

npm: refer to a peer dependency; how to align the version from a peer dependency

In abstract, I'm ok with the provided version of dependency-B, which is already installed thanks to dependency-A.
"dependencies": {
"dependency-A": "x.y.z",
}
$> npm ls --depth=1
├─┬ dependency-A#x.y.z
│ ├── dependency-B#x.y.z
So when I require('dependency-B'), I'll expect A's dependency.
I'm using the root function from that library and, in fact, if dependency-A bumps the version, I'd like to align with it and use the same version it uses.
If dependency-B is listed on the dependencies, a brand new package will be installed.
"dependencies": {
"dependency-A": "x.y.z",
"dependency-B": "a.b.c",
}
$> npm ls --depth=1
├─┬ dependency-A#x.y.z
│ ├── dependency-B#x.y.z
│ ├── ...
├─┬ dependency-B#a.b.c
I'm tempted to not list dependency-B on my dependencies. Should I avoid this practise? Isn't ok to rely on the peer version installed by my main dependency?
If this is a brad practise, how can I tell npm to give me the very same version it's installed by another package?
"dependencies": {
"dependency-A": "x.y.z",
"dependency-B": "~try the one that is installing dependency-A~",
}

tl;dr: You should always have all dependencies that you're using in your own dependencies object, as conformant implementations of package managers are not required to give you access to your dependencies' dependencies.
This is an interesting question, and I can think of two scenarios in which you might encounter this:
Both your package and dependency-A use dependency-B independently, for your own set of reasons, and you simply don't care which version to use.
You need to use dependency-B in order to interact with dependency-A, by creating objects of B or receiving objects of B created by A.
Scenario 1: Independent usage
If you and your dependency need the same package but don't need to share anything about it, Node gives you the amazing ability of using different versions of the same package in different places by specifying different versions in the package.json of your package and your library's. This is one of the strengths of the Node module system.
Your situation, however, is that you don't care about the actual version of the package (which makes me think this is not your scenario). In particular, you wonder if it's just better to not define anything in your own package.version and just let Node find your dependecy's dependency.
This last situation is only possible because you're using npm, and npm does one particular thing: it flattens the module tree in an effort to deduplicate packages, that is, so that multiple dependency specifications that can be satisfied by the same version are, in the end, using the exact same version. This reduces both the size and depth of the module tree, but creates the unintended consequence that you now have access to packages you havent specified as dependencies, just because they were installed in you node_modules directory for the purpose of deduplication.
This is not the only possible strategy though, and pnpm, another package manager, instead useds symlinks to achieve the same goals. I won't enter into much detail, but pnpm installs all dependencies in a different, system-wide (or user-specific) directory, and then symlinks from your node_modules (and from the dependencies' own node_modules) to the appropriate location in that folder. This achieves not only project-wise deduplication, but system-wide deduplication, as all of your projects using a specific package version will use the same installation. The consequence of this system, though, is that you "lose" the ability to use your dependencies' dependencies in your own package, because they're no longer physically in node_modules.
Apart from all that, is the idea that you don't care about the version they use. That's almost never the case, as the whole point of semantic versioning is to avoid or contain breakage due to dependency version upgrades. You don't care about the version you use now, but if that package gets upgraded in your dependency to a different major version, your package can break unexpectedly.
In conclusion, not defining a dependency that you are going to use anyway is a bad practice, both because it prevents other developers from using your package in a different package manager, and because it opens you to unexpected breakage that you won't be able to properly manage.
Scenario 2: Dependent usage
The more likely scenario given your description of the problem is that at some point in your usage of dependency-A, either it asks for something or returns something from dependency-B. In this situation it is desirable that both use the same, or at least compatible versions, so that all assumptions about the shape of the objects that are being exchanged hold.
The correct way of specifying this situation is to explicitly declare dependency-B as a peer dependency of dependency-A. If that's not the case, they're not being correct and you should most definitely bring that up in an issue if possible. As a workaround, you might just declare the same version as them and be wary o possible breakages due to version upgrades on their part. Not defining anything in your own package.json can have the same problems as in Scenario 1.
However, another possibility is that you don't even need to require that dependency. It might be the case that they expect you to pass data, functions, objects or anything that will be further passed to dependency-b, but in a way that shields you from ever having to interact with dependency-B directly. In this situation, they're essentially incorporating part of B's API into their own, and therefore any breaking change from dependency-B should also incur in a breaking change of dependency-A. This shields you from unexpected breakages, avoids you having to define anything in your package.json and means you're safe.

Is a package excluded from Stackage LTS because of an omitted dependency?

I'm a bit confused about how a dependency on a package affects including it in Stackage LTS; specifically, if
package A requires package B, and
package A works when package B is installed as an extra-dep on top of LTS-X.Y, but
package B itself is not in LTS-X.Y,
does package A have to be excluded from LTS-X.Y, particularly if
the only reason B is excluded is because of a test suite dependency, not a dependency in the library itself?

I'll copy/paste my answer on github
does package A have to be excluded?
No, it doesn't have to be excluded. Here's why:
even if the only reason B is excluded is because of a test suite dependency
In this case, we can add B to the build plan and mark it under the skipped-tests section in order to avoid pulling in its test suite dependencies. This is true of both LTS and nightly snapshots.
(However, a preferable course of action would be to remedy B's dependency issue so that the test suite can be run.)
To further clarify, in response to #bergey's answer:
packages are only included if the package's maintainer agrees to keep it up to date with respect to its dependencies
This is only true of packages explicitly included. Some packages are transitive dependencies, which are included implicitly, and are not necessarily held to such strict standards. (However, in the future we may eliminate the concept of implicit inclusion and instead include all packages explicitly.)
Exceptions can also be made so that a package may be included, even though its test suite or its benchmarks have incompatible dependency constraints with the snapsnot.
Of course the preferred way to go is to not need to make such exceptions, and we encourage all maintainers to keep all of their build targets up to date.
Finally, allow me to note that this question would probably be more well suited for the stackage mailing list, which is admittedly not well publicied or utilized.

Yes, for every package in a given Stackage snapshot, all of its transitive dependencies are also in the snapshot. Also, packages are only included if the package's maintainer agrees to keep it up to date with respect to its dependencies. There are more details about this in the README on github. An excerpt:
All packages are buildable and testable from Hackage. We recommend the Stack Travis script, which ensures a package is not accidentally incomplete.
All packages are compatible with the newest versions of all dependencies (You can find restrictive upper bounds by visiting http://packdeps.haskellers.com/feed?needle=PACKAGENAME).
All packages in a snapshot are compatible with the versions of libraries that ship with the GHC used in the snapshot (more information on lenient lower bounds).

What is Cabal Hell?

I am a little bit confused while reading about Cabal Hell, as the term is overloaded. I guess originally Cabal Hell referred to the diamond dependency problem, which was solved by restricting the build plan to have only a single version of any package in each build plan (two different versions of a package can't exist in a single build plan) as explained in this answer.
However, the term is also used in various other contexts. Such as destructive re-installations, incorrect package dependency boundaries (lower/upper version bounds), inconsistent environments ... (or any other error reported by Cabal).
Particular among these, I am confused about 1) destructive re-installations and 2) inconsistent environments? What do they mean, and how cabal new-build solves these problems (is it just sandboxing like cabal sandbox)? And what role ghc-pkg has to play here?
Any references or a simple example where these problems could be reproduced would be very appreciated.
Regarding "destructive re-installations": If I am not wrong, GHC has a package manager of itself (ghc-pkg), and the packages are installed as dynamically linkable libraries i.e: base depends on ghc-prim, so if ghc-prim is removed it will break base, am I right? And since GHC only allows one instance of a package with the same version, cabal install might register a newer build of the same (package, version) such that it breaks the dependents of the unregistered package. If the above understanding regarding "destructive re-installations" are correct; how does cabal new-build help here?

The only meaningful use of the term is the one given in the linked answer. Related are the follow-on problems from having lots of different packages in the global database, which can make encountering diamond dependencies more common, requiring destructive reinstalls to resolve, etc.
The other usages of the term are not helpful and just mean "problems somehow involving cabal."
That said, let me answer your other questions.
1) ghc-pkg is not a package manager, but rather a tool for managing ghc package databases. It is used by cabal to register packages into databases, and can be used by end-users to inspect the contents of the databases. Think of it as part of the underlying substrate provided by ghc, not a competing tool.
2) new-build eliminates and replaces the standard notion of a packagedb entirely. Instead of saying that a db consists of packages and versions, with at most one of each pair, instead a db consists of potentially many copies of packages at any given version, each with potentially different versions of its dependencies, all of which are managed in part by hash-addressing, so marked by a unique "fingerprint". This is called the store. When you new-build, cabal calculates a build plan irrespective of any previously installed dependencies, from scratch. If a particular fingerprint (consisting of a package, version, and the versions of all its dependencies, certain flags, etc) already exists in the store, then it makes use of it. If it does not, it calculates it.
As such, the only "diamond dependencies" that can occur are the truly insoluble ones, and not the ones occasioned by having fixed too-early (due to already-installed deps) some portion of the dependency tree.
tldr; you write "since GHC only allows one instance of a package with the same version" but new-build partially lifts this restriction in the store which allows the solver to produce better, more reproducible plans more often.

Where to install multiple compiler-specific libraries on UNIX-like systems

I need to install the same C++/Fortran library compiled with different compilers on the system with CMake. Is there a standard location where to install the different compiler-specific versions of the same library on the system? For example, assuming that lib.so and lib.a have already been installed using the system package manager under /usr/, is it good practice to install each of the additional compiler-specific versions in a different folder under let's say usr/local. Or is there a better way of doing this that you can advise?

It depends on how many compilers/libraries/versions you have. If you have just a few of them, I think that (almost) any choice on the location is right but I personally prefer /opt/ paths for manually installed code. But if you start having several combinations of them you easily get in trouble. Besides, I think that the question on the "best" location is related to the question on the "best" way to switch from the usage of one library to another one, possibly avoiding to manually set LD_LIBRARY_PATH, libraies to link or similar things.
I give some personal recommendations according to my experience for systems where you want to support many libraries/applications with many compilers/versions and also provide them for many users:
Do not use root user to install compiled software: just use an "installer account" and give read and execute permissions when needed to other users
Select a path for compiled software, e.g. /opt and define two subfolders /opt/build and /opt/install, the first one for your sources and where you compile them, the second as compilation target
Create some subfolders based on categories, e.g. /compilers, /libraries, /applications, ... from both /opt/build and /opt/install
Start preparing compilers under /compilers, e.g. /compilers/gnu/6.3 or /compilers/intel/2017. When possible compile them, e.g. from /opt/build/compilers/gnu/6.3 to /opt/install/compilers/gnu/6.3 or just put them into /install folder, e.g. /opt/install/compilers/intel/2017
Prepare the tree for libraries (or applications) adding subfolders which specify the version, the compiler and compiler version, e.g. compile from /opt/build/libraries/boost/1.64.0/gnu/6.3 and install to /opt/install/libraries/boost/1.64.0/gnu/6.3
At this stage, you have well organized things. But:
It is difficult to decide which library you want to use, you have to specify LD_LIBRARY_PATH or manually link the right one and the situation is worse when you deal also with applications
You are not considering dependencies between libraries: how can I force using g++ 6.3 when linking against boost/1.64.0/gnu/6.3?
To address these and many other issues, a good way of doing is using a tool which can help you, e.g. http://modules.sourceforge.net/ so that you can easily switch to one library to another one, force dependency, get help, and in general have something less error prone in the daily usage.

Should I keep all sub-packages on a single version in package.json?

There is a 3rd-party library my project uses that has split its functionality into multiple imported packages so that a project can install just what it needs. In package.json, several entries are present for the different sub-packages, like...
"dependencies": {
"#lib/dogs": "^1.0.3",
"#lib/cats": "^1.0.3",
"#lib/iguanas": "^1.0.3"
...lots more of the same...
}
I don't want to spend time thinking about compatibility issues if one of the sub-packages installs a different version# than the others through semver-range-picking or another developer fixing a problem by incrementing the version on just one sub-package. I suspect there is some risk of bugs if the sub-package versions get out of sync, even if the intent of the package maintainers is to respect the meaning of breaking changes in their versioning. It seems simpler to just have all the sub-packages on the same version by default.
Should I try to enforce (or at least promote) that the sub-packages have the same version?

Promote, but don't enforce.
Your current set-up, which uses Caret Ranges is the default used when installing with the --save flag for a reason: it's the most flexible and robust range to use for dependencies that correctly follow the semver conventions. This means that whenever someone update's your module as a dependency to theirs, it will automatically bump their sub-dependencies to the latest version that is backwards-compatible with the one explicitly specified after the ^.
Because of this, and the fact that scoped packages don't have interdependencies since they behave identically to normal dependencies, leaving identical caret ranges for each of them should already be sufficient enough to avoid compatibility issues by default.
Don't protect developers from themselves
A good methodology to follow when considering how to deal with compatibility issues is to avoid the antipattern of "protecting developers from themselves." In this situation, you propose to put a lock in place that prevents 3rd parties from editing the relative versions of your dependencies, to avoid compatibility issues. This is a very vague goal since you haven't actually run into any problems yet, as you've pointed out.
Sometimes, yes, developers might not know what they're doing, in which case they'll probably avoid tampering with your default dependency versions, but sometimes they do know, and it can be frustrating when a developer knows they can resolve a bug and are unnecessarily prevented from doing so. So hold their hand, don't cuff them.
npm already chose to avoid this antipattern, you should too.
If a 3rd-party developer chooses to use your module as a dependency, they should have the default amount of freedom available to manage their sub-dependencies through npm by using features like package-lock.json, which unlocks a very clean pattern for precisely managing sub-dependency versions without editing the source code of their dependencies.
In conclusion, what you have now is a very clean and flexible approach, following common conventions and not going out of the way to constrain 3rd-party developers.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string