Project-specific override for Cargo - dependency-management

Project-specific override for Cargo - dependency-management

I primarily want to use Debian's Rust packages, rather than fetching some random code from the wider Internet (I'm old-fashioned, I know, let's not get into that part). To this end, my ~/.cargo/config.toml looks like
[net]
offline = true
[source]
[source.crates-io]
replace-with = "debian"
[source.debian]
directory = "/usr/share/cargo/registry"
This works great after I install the librust-*-dev packages that I desire. However, in some specific projects, I'd like to lift this rule and tell Cargo "hey, you can in fact go wild and get whatever you want from crates.io". According to the Cargo book, a project-specific /project/.cargo/config.toml should take precedence over my user one. Assume this project-specific .cargo/config.toml:
[net]
offline = false
[source]
[source.crates-io]
I'm still not able to cargo build a project with dependencies from outside of my replacement source. If for example, I make a Cargo.toml that depends on yew (a randomly chosen crate that I know isn't available in my replacement source) I get
$ cargo build
error: no matching package found
searched package name: `yew`
What am I misunderstanding about Cargo's sources, replacement and per-project overrides?

The answer suggested by #blackgreen is one possible workaround for the underlying problem until issues 10045 and 10057 (or a combination thereof) are solved. Another, perhaps slightly less ugly, workaround follows below for those who need it.
I ended up working around the problem using UnionFS (I guess the more modern OverlayFS should work well too).
I simply add
[source.crates-io]
replace-with = "union"
[source.union]
directory = "/home/gspr/.cargo-overlay/union-registry"
to my ~/.cargo/config.toml and then do
unionfs -o ro /usr/share/cargo/registry:/home/gspr/.cargo-overlay/local-registry /home/gspr/.cargo-overlay/union-registry
Now /home/gspr/.cargo-overlay/union-registry reflects the union of /usr/share/cargo/registry and /home/gspr/.cargo-overlay/local-registry, with priority to the former in case of conflicts.
So what goes in ~/.cargo-overlay/local-registry? Individual extra crates, in the same way as in Debian's /usr/share/cargo/registry. That is to say, directories named cratename-version as they are distributed by upstream – but with a single extra file, namely .cargo-checksum.json added to them. The content of that extra file can be extracted from the crates.io index as follows.
Suppose we have cloned the crates.io index into ~/.cargo-overlay/crates.io-index, i.e.
git clone https://github.com/rust-lang/crates.io-index.git ~/.cargo-overlay/crates.io-index
Then suppose we've extracted a crate foo at version 0.1.2 into ~/.cargo-overlay/local-registry/foo-0.1.2. We can generate the missing .cargo-checksum.json like so:
cd ~/.cargo-overlay
index_file=$(find crates.io-index -type f -name foo)
cksum=$(jq -r "select(.name == \"foo\" and .vers == \"0.1.2\" ) | .cksum" ${index_file})
jo package="${cksum}" files="{}" > local-registry/foo-0.1.2/.cargo-checksum.json

It looks as if you are suffering from this issue: https://github.com/rust-lang/cargo/issues/8687
You would like to unset a config key on a upper-level config.toml but this is not supported.
I've played a bit with the config, and the only way I got it to work was to overwrite in the project-local config.toml the properties that were set in the upper-level config.toml.
In your case your upper-level config.toml specifies replace-with, so you have to overwrite that. But you can't overwrite it with crates-io, which is the registry you want to use, because that is exactly the registry with the replace-with key.
So until the above issue gets acted upon, we have to, essentially, use a mirror, both in the config and as an actual registry to download from:
[net]
offline = false
[source]
[source.crates-io]
replace-with = "crates-io-mirror"
[source.crates-io-mirror]
registry = "https://gitlab.com/integer32llc/crates.io-index"
As we both tested, it seems it's not possible to reuse the normal crates.io registry url because that is already defined and will fail with:
error: source crates-io-mirror defines source registry https://github.com/rust-lang/crates.io-index, but that source is already defined by crates-io note: Sources are not allowed to be defined multiple times.
So instead the URL above is an actual mirror server of crates.io. Then you can run cargo build successfully in the local project.

The recently released Cargo 1.56 adds a feature that should let one do what my question asks for: patch tables can now be specified in a project-specific .cargo/config.toml, which means that [patch] stanzas can now be introduced outside of Cargo.toml. That should do the trick! I haven't yet verified this, as I am stuck with an older Cargo for a little while still.

Related

Is there any difference on how to declare a ClojureScript dependency on shadow-cljs.edn file?

I have been working on a Clojure/ClojureScript project and something intrigues me.
On the shadow-cljs.edn file, there is a declaration of the
dependencies. As you might see below, some of them have "a full name"
declaration, indicated as username/repository-name. An example is
venantius/accountant.
Others are declared only as repository-name, such as [bidi "2.1.5"] which is actually published by juxt user (source).
I am afraid this could be problematic since multiple users could create repositories with the same name:
{:source-paths ["src" "dev" "test"]
:dependencies [
;; for deploy w lein deps below need to be in project.cljs
;; third-party dependencies
[venantius/accountant "0.2.5"]
[bidi "2.1.5"]
[cljs-hash "0.0.2"]
[clova "0.46.0"]
[com.andrewmcveigh/cljs-time "0.5.2"]
[org.clojure/core.match "1.0.0"]
[binaryage/dirac "RELEASE"]
[com.pupeno/free-form "0.6.0"]
[garden "1.3.10"]
[hickory "0.7.1"]
[metosin/malli "0.8.4"]
[medley "1.4.0"]
[binaryage/oops "0.7.0"]
[djblue/portal "0.16.1"]
[djblue/portal "0.18.0"]
[proto-repl "0.3.1"]
[reagent "1.1.0"]
[re-frame "1.2.0"]
[district0x/re-frame-window-fx "1.1.0"]
[cljsjs/react-beautiful-dnd "12.2.0-2"]
I am not sure how the low-level of dependency installation goes in a Clojure/ClojureScript project.
Is it a bad practice to have only the brief name of dependency? Is an ambiguity problem feasible or even possible?

Until not too long ago it was allowed to publish dependencies to https://clojars.org without a group name. In those cases the group would become identical to the artifact id. So bidi is effectively bidi/bidi.
Nowadays, new packages may only be published with a specific group name. However, old packages may continue using their older name.
The names used to publish also do not need to match their github repo coordinates. These are separate systems. They often match but are not required to.
To anwer your question: You should avoid using the same dependency multiple times. And you should use the official published name for each library. Some libraries are still updated using their old identifiers. Some moved to the newer longer names, while the old ones are still available but no longer receiving updates. Always consult the documentation of the specific libs to be sure which one you are supposed to use. They'll usually have some kind of info in their READMEs.
Conflicts may happen if you get the "same" lib via different identifiers. These may be very difficult to identify, when you run into trouble. This is true for any dependency resolver your use (eg. project.clj, deps.edn, shadow-cljs.edn). Best practice is to keep your dependencies as clean as possible.

Publish only one parent crate on multi-crate project

I am creating a library, that is nearly close to its first release, so I would like to upload it to crates.io. Library has a multi-crate design, so I ended with something like:
- CrateA
- CrateProcMacros
- CrateC
- CrateD
- CrateE
- CrateF
- Cargo.toml (handles the workspace)
- Cargo.lock
...
where CrateA is the parent of the other crates, and has dependencies on another of those local crates, and some of those crates also depends on another ones. I mean, it's the primary crate of the library, the one responsible for exposing the public API of the project, and the unique one that I would like to be published in crates.io.
Reading the cargo docs I am seeing that I won't be able to publish a unique crate to the registry. All will be uploaded and published.
So, what alternatives I have to only publish my CrateA to the registry? Should I change my project's structure, and move to CrateA all the other packages and then try to publish it? Or there's some way to achieve this?
EDIT
CrateA have direct dependencies on another crates. An those others also depends on another one inside my workspace.

The way Cargo packaging works is that you are publishing your source code nearly unchanged. There is no pre-compilation step. There is no step where multiple library crates are gathered into one package. The only way to publish your CrateA is to publish all of its dependencies too.
There is interest in making a multi-crate project easier to publish, but for now, you've got to do it all explicitly.
Make sure each package in your project declares a [package] name that makes sense in public. (The name of the directory you keep it in doesn't matter.) It's common to have names like myproject-partoftheproject, where the package people actually use normally would be named myproject.
Make sure that each dependency declaration has a version number (not just a path) matching what you're going to publish. (You don't have to remove the path; that will be done for you within publication.)
Publish each package. You must do this in reverse dependency order — that is, CrateA last.
No one will mind that you've published extra packages that aren't meant for direct use — for example, lots of libraries necessarily have separate proc-macro packages. Though, if you have any crates that are really just for code organization and don't have any particular benefit, you could consider making them into modules inside fewer crates.

master Cargo.toml should be like this
[workspace]
members = [
"CrateA",
"CrateB",
...
]
And CrateA/Cargo.toml should be like this
[package]
name = "Foo"
version = "0.0.0"
edition = "2021"
authors = ["Foo <Foo#gmail.com>"]
license = "Bar"
description = "Baz"
[dependencies]
CrateB = { path = "../CrateB", version = "0.0.0" }
CrateC = "0.0.0"

How do I integrate a fork of one of the crates I'm using into my project? [duplicate]

I am trying to configure my Rust project with an external dependency in GitHub. Unfortunately, some last commits made some changes in interfaces so I am unable to use the latest version. The developers also do not care of tags and separate branches for different versions, so I think the only correct way is to specify a certain commit somehow where the interface fits what I worked with.
What I have now in Cargo.toml is:
[dependencies]
...
thelib = { git = 'https://github.com/someguys/thelib' }
I saw it is possible to specify a branch like this:
thelib = { git = 'https://github.com/someguys/thelib', branch = 'branch1' }
But I have not seen a working example with a commit. Could anybody provide one here?

As hinted in the Cargo.toml vs Cargo.lock section of the Cargo guide, you can use the rev property to specify a commit hash:
[...] If you build this package today, and then you send a copy to me, and I build this package tomorrow, something bad could happen. There could be more commits to rand in the meantime, and my build would include new commits while yours would not. Therefore, we would get different builds. This would be bad because we want reproducible builds.
We could fix this problem by putting a rev line in our Cargo.toml:
[dependencies]
rand = { git = "https://github.com/rust-lang-nursery/rand.git", rev = "9f35b8e" }
It is also mentioned in Specifying dependencies, although no example is given (emphasis mine):
Since we haven’t specified any other information, Cargo assumes that we intend to use the latest commit on the master branch to build our package. You can combine the git key with the rev, tag, or branch keys to specify something else. [...]

You can use the rev key to specify a commit hash. For example:
thelib = { git = "https://github.com/someguys/thelib", rev = "9f35b8e" }
It's briefly mentioned in this section of the Cargo book.

Shared library versioning with cmake on github

I have a fairly new project on github that produces a shared library. Going forward, I would like to use semantic versioning (as described at semver.org) for the shared library major/minor/patch numbers in the file name. The project uses CMake. The CMakeLists.txt file refers to CPACK_PACKAGE_VERSION_MAJOR, CPACK_PACKAGE_VERSION_MINOR and CPACK_PACKAGE_VERSION_PATCH, and sets these to default values if they are not passed in on the command line.
My plan is to branch on ABI changes and API additions, according to semantic versioning principles.
I know github has support for creating and naming release packages containing the project source based on git tags. But I do not see a way to propagate the major, minor and patch numbers to the shared library name when the github user builds a release on their machine.
For example, if I have a branch called, myproj_1_2, and a release tag called myproj_rel_1_2_9, is there a way to have the shared library built by a user be name libmyproj.so.1.2.9?
Is this just a matter of documenting that a user should pass the build name information on the cmake command line, and the have the CMakeLists.txt file parse this and set CPACK_PACKAGE_VERSION_MAJOR, CPACK_PACKAGE_VERSION_MINOR and CPACK_PACKAGE_VERSION_PATCH accordingly, or is there a more elegant way to do this?

Your statement about how CPACK_PACKAGE_VERSION_XXX is set is incorrect. The CPack variables in question are set by the project command if the project command specifies versioning. So when you create the 1.2.9 branch you would set 1.2.9 as the version number in the project command.
From CPack Help
CPACK_PACKAGE_VERSION_MAJOR
Package major version. This variable will always be set, but its default value depends on whether or not version details were given to
the project() command in the top level CMakeLists.txt file. If version
details were given, the default value will be
CMAKE_PROJECT_VERSION_MAJOR. If no version details were given, a
default version of 0.1.1 will be assumed, leading to
CPACK_PACKAGE_VERSION_MAJOR having a default value of 0.
Project command
> project(<PROJECT-NAME>
> [VERSION <major>[.<minor>[.<patch>[.<tweak>]]]]
> [DESCRIPTION <project-description-string>]
> [HOMEPAGE_URL <url-string>]
> [LANGUAGES <language-name>...])
If you don't want to set the VERSION via the project command then there are multiple other ways of setting the relevant variables.
Examples are located:
https://cmake.org/cmake-tutorial/
Also look at how CMake handles versions:
https://gitlab.kitware.com/cmake/cmake/blob/master/Source/CMakeVersionSource.cmake
https://gitlab.kitware.com/cmake/cmake/blob/master/Source/cmVersionConfig.h.in
Another example of how to get git meta data for setting version related information:
https://github.com/pmirshad/cmake-with-git-metadata/blob/master/CMakeLists.txt

go_remote_library usage in Pants

I am currently attempting to use the go_remote_library target??, package??, plugin?? in Pants. Real simple question, here:
If in my code I have the import listed as:
import(
"github.com/golang/groupcache"
)
is it valid for me to specify a name of simply "groupcache" instead of the full import path? Here is what my BUILD file looks like:
go_remote_library(name="groupcache",
rev="d781998583680cda80cf61e0b37dd0cd8da2eb52"
)
Am I doing this right? As a side note, is there a Pants target that I can use to test that my BUILD file is valid? Thanks!

You are doing it right. All of the go targets - go_remote_library in this case, but also go_library and go_binary - currently take a name parameter and it must be the name of the directory the BUILD file lives in. The next release of pants (0.0.44) should remove the name parameter taking the choice away from you.
The 1st line of defense is the BUILD Dictionary.
For go_remote_library you'll find this doc.
As to testing, the simplest test is checking syntax, and for that this does the trick:
./pants list path/to/BUILD:
Note the trailing colon attached to the path
This says "List all the targets defined in path/to/BUILD. Here the : means all - its equivalent to the * wildcard in bourne shells for pants targets in BUILD files.
If you want to check more targets all at once you could say:
./pants list ::
Here the recursive glob is used - equivalent to ** in zsh, and so this asks pants to list all the targets in the repo.
If the syntax checks out, you may still have more subtle issues, like defining a go_remote_library that does not point to a valid github project. These issues will only show up when you try to do more than act on the target's metadata like list and depmap goals do. For a go_remote_library, the simplest way to exercise it is to try and resolve the library:
./pants resolve 3rdparty/go/github.com/bitly/go-simplejson2
If you have this BUILD file contents at that path:
go_remote_library(name='go-simplejson2')
Running the resolve will fail since no such github repo exists.
You can do a similar higher-level check with go_library and go_binary targets, instead running ./pants compile .... This will smoke out whether you're missing any required go_remote_library BUILD files or dependencies.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string