Cargo dependency conflict with multiple git submodules - rust

We have a Rust Cargo project under Git that we have divided into multiple submodules each under their own git repo to allow controlled access by different teams, some external. Some teams will just work on one sub-module. Here is a simplified structure:
Project
---Module 1
---Sub-Module 1
---Sub-Module 2
---Sub-Module 3
Module 1 has a dependency on all 3 sub-modules; and Sub-Module 1 and Sub-Module 2 also have dependencies on Sub-Module 3.
The advantage of using sub-modules is that changes can be made to Module 1 and the sub-modules together and compiled togeher, as opposed to just keeping the sub-modules as separate repos and developing them separately.
Since Sub-Module 1 and Sub-Module 2 are independent repos, they have no direct knowledge of Sub-Module 3 and therefore must include it via the git repo.
Module 1 is including Sub-Module 3 as a direct path. This causes the conflict in Cargo as Module 1 has two versions of Sub-Module 3 - one direct dependency and one through Sub-Module 1 / Sub-Module 2.
Solution A would be to include Sub-Module 3 into Module 1 via the git repo (instead of via a direct path), but this defeats the object of having it as a submodule as any coding changes to Sub-Module 3 would have to be coded, committed and pushed to the repo before Module 1 can see them.
Solution B would be to add Sub-Module 3 in itself as a sub-module to Sub-Module 1 and Sub-Module 2 and this would then negate the need to define the depenency via the git repo. But then Sub-Module 3 would appear twice in the Project and this might get confusing. Also, we have not tested this but suspect that Cargo would still have the same conflict as it will still have two versions of Sub-Module 3.
This is the type of error being produced by Cargo:
= note: expected struct sub_module_3::ExampleStruct
found struct ExampleStruct
= note: perhaps two different versions of crate `sub_module_3` are being used?
Any advice on how to solve this much appreciated.
Thanks

To normalize the dependency graph where some crates are accessing each other by version or repository and others are accessing the same crates by path, you need to override the nested dependencies with a [patch] section in your Cargo.toml:
[patch.'https://github.com/sub-module-3'] # sub-module-1 and sub-module-2 use this dependency
sub-module-3 = { path '../sub-module-3' } # they should instead use the local path
This can either be done just for the module-1 crate or at the workspace root if you want to have the patches applied for developing the sub-modules.

Related

Publish only one parent crate on multi-crate project

I am creating a library, that is nearly close to its first release, so I would like to upload it to crates.io. Library has a multi-crate design, so I ended with something like:
- CrateA
- CrateProcMacros
- CrateC
- CrateD
- CrateE
- CrateF
- Cargo.toml (handles the workspace)
- Cargo.lock
...
where CrateA is the parent of the other crates, and has dependencies on another of those local crates, and some of those crates also depends on another ones. I mean, it's the primary crate of the library, the one responsible for exposing the public API of the project, and the unique one that I would like to be published in crates.io.
Reading the cargo docs I am seeing that I won't be able to publish a unique crate to the registry. All will be uploaded and published.
So, what alternatives I have to only publish my CrateA to the registry? Should I change my project's structure, and move to CrateA all the other packages and then try to publish it? Or there's some way to achieve this?
EDIT
CrateA have direct dependencies on another crates. An those others also depends on another one inside my workspace.
The way Cargo packaging works is that you are publishing your source code nearly unchanged. There is no pre-compilation step. There is no step where multiple library crates are gathered into one package. The only way to publish your CrateA is to publish all of its dependencies too.
There is interest in making a multi-crate project easier to publish, but for now, you've got to do it all explicitly.
Make sure each package in your project declares a [package] name that makes sense in public. (The name of the directory you keep it in doesn't matter.) It's common to have names like myproject-partoftheproject, where the package people actually use normally would be named myproject.
Make sure that each dependency declaration has a version number (not just a path) matching what you're going to publish. (You don't have to remove the path; that will be done for you within publication.)
Publish each package. You must do this in reverse dependency order — that is, CrateA last.
No one will mind that you've published extra packages that aren't meant for direct use — for example, lots of libraries necessarily have separate proc-macro packages. Though, if you have any crates that are really just for code organization and don't have any particular benefit, you could consider making them into modules inside fewer crates.
master Cargo.toml should be like this
[workspace]
members = [
"CrateA",
"CrateB",
...
]
And CrateA/Cargo.toml should be like this
[package]
name = "Foo"
version = "0.0.0"
edition = "2021"
authors = ["Foo <Foo#gmail.com>"]
license = "Bar"
description = "Baz"
[dependencies]
CrateB = { path = "../CrateB", version = "0.0.0" }
CrateC = "0.0.0"

Project-specific override for Cargo

I primarily want to use Debian's Rust packages, rather than fetching some random code from the wider Internet (I'm old-fashioned, I know, let's not get into that part). To this end, my ~/.cargo/config.toml looks like
[net]
offline = true
[source]
[source.crates-io]
replace-with = "debian"
[source.debian]
directory = "/usr/share/cargo/registry"
This works great after I install the librust-*-dev packages that I desire. However, in some specific projects, I'd like to lift this rule and tell Cargo "hey, you can in fact go wild and get whatever you want from crates.io". According to the Cargo book, a project-specific /project/.cargo/config.toml should take precedence over my user one. Assume this project-specific .cargo/config.toml:
[net]
offline = false
[source]
[source.crates-io]
I'm still not able to cargo build a project with dependencies from outside of my replacement source. If for example, I make a Cargo.toml that depends on yew (a randomly chosen crate that I know isn't available in my replacement source) I get
$ cargo build
error: no matching package found
searched package name: `yew`
What am I misunderstanding about Cargo's sources, replacement and per-project overrides?
The answer suggested by #blackgreen is one possible workaround for the underlying problem until issues 10045 and 10057 (or a combination thereof) are solved. Another, perhaps slightly less ugly, workaround follows below for those who need it.
I ended up working around the problem using UnionFS (I guess the more modern OverlayFS should work well too).
I simply add
[source.crates-io]
replace-with = "union"
[source.union]
directory = "/home/gspr/.cargo-overlay/union-registry"
to my ~/.cargo/config.toml and then do
unionfs -o ro /usr/share/cargo/registry:/home/gspr/.cargo-overlay/local-registry /home/gspr/.cargo-overlay/union-registry
Now /home/gspr/.cargo-overlay/union-registry reflects the union of /usr/share/cargo/registry and /home/gspr/.cargo-overlay/local-registry, with priority to the former in case of conflicts.
So what goes in ~/.cargo-overlay/local-registry? Individual extra crates, in the same way as in Debian's /usr/share/cargo/registry. That is to say, directories named cratename-version as they are distributed by upstream – but with a single extra file, namely .cargo-checksum.json added to them. The content of that extra file can be extracted from the crates.io index as follows.
Suppose we have cloned the crates.io index into ~/.cargo-overlay/crates.io-index, i.e.
git clone https://github.com/rust-lang/crates.io-index.git ~/.cargo-overlay/crates.io-index
Then suppose we've extracted a crate foo at version 0.1.2 into ~/.cargo-overlay/local-registry/foo-0.1.2. We can generate the missing .cargo-checksum.json like so:
cd ~/.cargo-overlay
index_file=$(find crates.io-index -type f -name foo)
cksum=$(jq -r "select(.name == \"foo\" and .vers == \"0.1.2\" ) | .cksum" ${index_file})
jo package="${cksum}" files="{}" > local-registry/foo-0.1.2/.cargo-checksum.json
It looks as if you are suffering from this issue: https://github.com/rust-lang/cargo/issues/8687
You would like to unset a config key on a upper-level config.toml but this is not supported.
I've played a bit with the config, and the only way I got it to work was to overwrite in the project-local config.toml the properties that were set in the upper-level config.toml.
In your case your upper-level config.toml specifies replace-with, so you have to overwrite that. But you can't overwrite it with crates-io, which is the registry you want to use, because that is exactly the registry with the replace-with key.
So until the above issue gets acted upon, we have to, essentially, use a mirror, both in the config and as an actual registry to download from:
[net]
offline = false
[source]
[source.crates-io]
replace-with = "crates-io-mirror"
[source.crates-io-mirror]
registry = "https://gitlab.com/integer32llc/crates.io-index"
As we both tested, it seems it's not possible to reuse the normal crates.io registry url because that is already defined and will fail with:
error: source crates-io-mirror defines source registry https://github.com/rust-lang/crates.io-index, but that source is already defined by crates-io note: Sources are not allowed to be defined multiple times.
So instead the URL above is an actual mirror server of crates.io. Then you can run cargo build successfully in the local project.
The recently released Cargo 1.56 adds a feature that should let one do what my question asks for: patch tables can now be specified in a project-specific .cargo/config.toml, which means that [patch] stanzas can now be introduced outside of Cargo.toml. That should do the trick! I haven't yet verified this, as I am stuck with an older Cargo for a little while still.

NPM local package install

This might well be an ill conceived idea, but I have two react projects in version control, the first, let's call it A, contains a component I want to use in B. B therefore has a dependency on A declared in the package.json for B, as a file: ...path to project A.
The problem is that in order to build project B, the user needs to load both A and B onto their disk, then build A (this is a rollup build) and then build B. Because (I think) B depends on A with a file: reference, when installing A, NPM copies the whole directory including the node_modules folder of A under B. So you end up with B/node_modules/A/node_modules
I think the issue is that we are using the file system location of project A as both the source code location and the registry location if that makes sense. Perhaps we need to publish project A somewhere when it gets built and declare that location to be the dependency?
I hope that makes sense.
I looked at the docs and it seems like node_modules should always be ignored on an install if I am understanding the files section at all.
Common pattern for reusing components between projects is to move the component (or any piece of code that needs to be reused) outside of either A and B and create a standalone npm module containing that code/components.
Publish that module to either public or private npm repository depending on the sensitivity of the code and let both projects A and B install its own instance of that npm module. Otherwise these interdependencies will only cause you headaches in a long run.

How do I deal with puppet modules with classes of the same name?

I have a puppet module that uses gini-archive. Recently I change my module to depend on biemond-wildfly, which depends on nanliu-archive.
However, I can't install nanliu-archive, because both of these archive modules install into a directory called archive. This, I believe, violates the puppet module requirements, as they should both install into directories called <username>-archive.
However, even if I put them in different directories, I still have a problem. Both classes are called archive (actually one is a class and one is a define, but I don't think that's too important right now), so when my module says include archive, puppet isn't going to know which one I want.
Note I have a java background where every class is in a package hierarchy which prevents these kind of issues, but I can't see any equivalent for puppet.
I know I could have a whole load of different modules directories (/etc/puppet/modules, /etc/puppet/modules2 etc), but puppet still seems to look through these in order, meaning it will always load the archive class from the first module directory in the list.
Is there any way of solving this or have I reached the limit of what puppet can do? I'd rather not have to fork every single module and change the class names, that seems to defeat the point of the forge.
Thanks.
The name of the directory the module is in must be archive, the username is only used for the purpose of distributing and packaging modules but is not used by puppet while autoloading. Basically, what you are seeing is correct.
There seems to be two ways of handling this:
Fork one of the two archive modules and rename the module so that it does not collide
Fork one of the modules using the archive modules and migrate it to use the same archive module as the other one. Since the two archive modules do almost the same thing, I prefer this method.
I just did this so I'm going expand a bit on option (1) in #ChrisPitman's answer by including more details using a module I just forked & renamed as an example.
(Unfortunately) the simplest solution is to fork one of the modules and rename it. Below is an example using puppet/selinux and thias/selinux which have a namespace collision at selinux. The following steps were taken to re-namespace the thias/selinux module into the namespace selinux_thias:
Fork the module. In this example I have created USF-IMaRS/puppet-selinux from thias/puppet-selinux.
Install the module into modules/$NEW_NAME. Using git submodules this is: git submodule add https://github.com/USF-IMARS/puppet-selinux modules/selinux_thias
rename the module class(es). Here is a commit demonstrating what this basically looks like.
modify modules using thias/selinux to use new name selinux_thias instead of selinux.

launchpad.net: Multiple dependencies in the same large project...?

I have a large project which contains many libraries that the main binary depends on. I would like to know what the proper way to handle this in launchpad so I can build the libraries, then the main binary and offer each debian pacakge on a ppa.
You can see the project in question at lp:snapcpp (https://code.launchpad.net/snapcpp/). In snacpp, we have "snapwebsites", a C++ CMS system which attaches to a Cassandra database via our library "libQtCassandra." "snapwebsites" depends on libQtCassandra, as it does libltd, and others. Each of these libraries need to be separate debian packages themselves. Each project has its own "debian" folder but there is no root debian folder at this time.
How can I get this to work on launchpad, which requires a root debian folder? Do I need to construct a debian project at the root that lists each dependency? If not, do I need to break up each project into its own branch using bzr? If I do the latter, how do I call out those depencency debs for the build (in other words, how do I tell the recipe for snapwebsites that it needs to have libQtCassandra and its dependency packages installed)?
Thanks!
The solution that I discovered on my own was to utilize the recipe command "nest-part," which allows you to take a single folder out of a bzr branch and map it into your project. It cannot, however, map to the root of your branch.
What I did was to create a branch with only packaging information in it, and a CMakeLists.txt file containing "add_subdirectory(src)". Then I map from the main code branch (lp:snapcpp), but only the project in question. For example, here is the recipe for the "controlled_vars" project in snapcpp:
# bzr-builder format 0.3 deb-version {debupstream}+{revno}
lp:~snapcpp/snapcpp/controlled_vars
nest-part src lp:snapcpp controlled_vars src
There does need to be a branch with packaging information with each sub-project, but this is a one-time set up issue.

Resources