I have a python package which depends on pytorch and which I’d like windows users to be able to install via pip (the specific package is: https://github.com/mindsdb/lightwood, but I don’t think this is very relevant to my question).
What are the best practices for going about this ?
Are there some project I could use as examples ?
It seems like the pypi hosted version of torch & torchvision aren’t windows compatible and the “getting started” section suggests installing from the custom pytorch repository, but beyond that I’m not sure what the ideal solution would be to incorporate this as part of a setup script.
What are the best practices for going about this ?
If your project depends on other projects that are not distributed through PyPI then you have to inform the users of your project one way or another. I recommend the following combination:
clearly specify (in your project's documentation pages, or in the project's long description, or in the README, or anything like this) which dependencies are not available through PyPI (and possibly the reason why, with the appropriate links) as well as the possible locations to get them from;
to facilitate the user experience, publish alongside your project a pre-prepared requirements.txt file with the appropriate --find-links options.
The reason why (or main reason, there are others), is that anyone using pip assumes that (by default) everything will be downloaded from PyPI and nowhere else. In other words anyone using pip puts some trust into pypi.org as a source for Python project distributions. If pip were suddenly to download artifacts from other sources, it would breach this trust. It should be the user's decision to download from other sources.
So you could provide in your project's documentation an example of requirements.txt file like the following:
# ...
torch===1.4.0 --find-links https://download.pytorch.org/whl/torch_stable.html
torchvision===0.5.0 --find-links https://download.pytorch.org/whl/torch_stable.html
# ...
Update
The best solution would be to help the maintainers of the projects in question to publish Windows wheels on PyPI directly:
https://github.com/pytorch/pytorch/issues/24310
https://github.com/pytorch/vision/issues/1774
https://pypi.org/help/#file-size-limit
Related
PyCharm offers to synchronize imported packages (eg openpyxl) with repositories.
Is it good practice to synch these (even though they are imported standard packages)?
Thanks
The answer is no.
A virtual environment need not be replicated as the packages and their version can be listed with the pop freeze command into a text file called requirements.txt that should be shared.
Others can use this file to build up the libraries.
we would like to install Spark-Alchemy to use it within Pyspark inside foundry (we would like to use their hyperloglog functions). While I know how to install a pip package, I am not sure what it is needed to install this kind of package.
Any help or alternative solutions related to the use of hyperloglog with pyspark will be appreciated, thanks!
PySpark Transform repositories in Foundry are connected to conda. You can use the coda_recipe/meta.yml to pull packages into your transforms. If a package you want is not available in your channels, I would recommend you reach out to your administrators to ask if it's possible to add it. Adding a custom jar that extends spark is something that needs to be reviewed by your platform administrators since it can represent a security risk.
I did a $ conda search spark-alchemy and couldn't find anything related and reading through these instructions https://github.com/swoop-inc/spark-alchemy/wiki/Spark-HyperLogLog-Functions#python-interoperability it makes me guess that there isn't a conda package available.
I can't comment about the use of this specific library but in general, Foundry support Conda channels and if you have a Conda repo and configure foundry to connect to that channel you can add this library or others and reference them in your code.
Not sure SO is the best place to ask this, but it is development related so maybe someone can help.
I've written an app (in python but that's not important) which parses a Yum repo database to collate RPM packages and their dependencies. The problem I have is that I am sucking in too many packages when a dependency is met by more than one.
Specific example: I am seeking the list of packages which meet dependencies for Java-1.8.0 and getting a dependency of libjli.so()(64bit). libjli.so()(64bit) My code correctly works out that this is provided by multiple -devel packages from the Java 1.8, 1.7 and 1.6 streams. Unfortunately all three versions (and their dependencies) then get included in my list.
I guess my question is, given a list of packages meeting a requirement, what is the best way to identify the most appropriate package to include? i.e. when resolving the dependencies for Java-1.8.0, only include the -devel package for 1.8.0 and not suck in the -devel packages for 1.6 and 1.7 as well.
I know this is a problem with my code, I'm just not sure what facilities are provided by the yum ecosystem to help me identify which package would be best to include from the list of multiple.
It is hard to tell without seeing your code.
Yum is dead. If you are developing something new, you should develop on top of DNF. DNF use satsolver algorithm (https://doc.opensuse.org/projects/satsolver/11.4/index.html) and you can use libdnf https://github.com/rpm-software-management/libdnf (formerly known as libhif, formerly known as libhawkey).
I am trying to figure out how to add a GitHub project to my simple, working Launchpad PPA package. The GitHub project that I am try to add is https://github.com/compiz-reloaded/compiz-boxmenu. I couldn't find much help online and I'm hoping that someone can help point me in the right direction on how to accomplish this. Thanks!
You need to 'debianize' your package first. The debianization depends on the package type, and the manual for a package debianization is called Debian Policy.
This wiki is also very useful. Once you have your package debianized, you should compile it using the source option (I usually do it using dpkg-buildpackage -S. Pass your key using -k<Key> also. The same one you uploaded to your launchpad account.
Once you built your source, you will find a file called package_version.changes. You basically upload it as described in your PPA information. The package will be compiled, and, if no errors are found, it will be available in the PPA. If you want to enable the build for other architectures, as IBM POWER (ppc64el) or ARM (aarch64) , you should opt in.
I'm developing an application using Python 3. What is the best practice to use third party libraries for development process and end-user distribution? Note that I'm working within these constraints:
Developers in the team should have the exact same version of the libraries.
An ideal solution would work on both Windows and Linux.
I would like to avoid making the user install software before using our own; that is, they shouldn't have to install product A and product B before using ours.
You could use setuptools to create egg files for your libraries, assuming they aren't available in egg form already. You could then bundle the eggs alongside your software, which would need to either install them, or ensure that they were on the import path.
This has some complexities, i.e. if your libraries have C-extensions, then your eggs become platform-specific, but in my experience this is the most widely-accepted means of 'bundling' stuff in Python.
I have to say that this remains one of Python's weaknesses, though; the third-party ecosystem is certainly aimed at developers rather than end-users.
There are no best practices, but there are a few different tracks people follow. With regard to commercial product distribution there are the following:
Manage Your Own Package Server
With regard to your development process, it is typical to either have your dev boxes update from a local package server. That allows you to "freeze" the dependency list (i.e. just stop getting upstream updates) so that everyone is on the same version. You can update at particular times and have the developers update as well, keeping everyone in lockstep.
For customer installs you usually write an install script. You can collect all the packages and install your libs, as well as the other at the same time. There can be issues with trying to install a new Python, or even any standard library because the customer may already depend on a different version. Usually you can install in a sandbox to separate your packages from the systems packages. This is more of a problem on Linux than Windows.
Toolchain
The other option is to create a toolchain for each supported OS. A toolchain is all the dependencies (up to, but not including base OS libs like glibc). This toolchain gets packaged up and distributed for both developers AND customers. Best practice for a toolchain is:
change the executable to prevent confusion. (ie. python -> pkg_python)
don't install in .../bin directories to prevent accidental usage. (ie. on Linux you can install under .../libexec. /opt is also used although personally I detest it.)
install your libs in the correct location under lib/python/site-packages so you don't have to use PYTHONPATH.
Distribute the source .py files for the executables so the install script can relocate them appropriately.
The package format should be an OS native package (RedHat -> RPM, Debian -> DEB, Win -> MSI)
For developers use PIP with requirements file.
For end users, specify requirements in setup.py.