Support several versions of a package if version change breaks code - python-3.x

I have a Python3 project which uses Biopython package. One of its modules got removed in the latest version so I have to change a small piece of code to support this change. On the other hand this change would break my code for all "old" version of Biopython (which are heavily used on productive systems).
My questions:
What is the proper way to deal with this?
If this makes sense: How do I support old and new package versions at the same time? Do I perform a run time check to see which version I have an then run different code? Or is this a bad idea? If you think this is the way to go: Is there a standard way to do this?

The simplest way to ensure a specific version is present is to pin that version in your requirements.txt file (or other dependency specifications). There are plenty of systems which rely on legacy versions of packages, and especially for a package without any security implications this is totally reasonable.
If supporting multiple versions is your goal, you could perform some basic checks during your package import process, in an __init__.py file or elsewhere. This pattern is somewhat common, especially useful for version compatibility between Python 2 & 3:
def foo_function():
return
try:
import biopython.foo as foo
except (ImportError, AttributeError):
foo = foo_function
foo()
I have seen this countless times in the wild on GitHub--of course now that I try to find an example I cannot--but I will update this answer with an example when I do.
EDIT: If it's good enough for Numpy, it's probably good enough for the rest of us. numpy_base.pyi L7-13

Related

Less tedious way to generate poetry dependencies?

Every time I start a new poetry project, I have to go through a tedious process of listing dependencies. These include:
poetry add every dependency one by one even though it's already listed in my import block
Source diving to figure out the actual minimum version of the package given the minimal functionality I use
Going down the rabbit hole of CPython code to figure out the minimum version of Python
I don't really like the Poetry approach of just requiring whatever version I have installed. As a developer, I tend to install bleeding edge version of packages and Python, which many of my users don't have. I then get annoying bug reports that come down to "the python version is wrong" but the user is often very confused by the error messages. The process of finding minimum dependency versions is typically not very complicated, it's just tedious and not scalable.
Surely there is a tool out there that can do some static analysis and get me started with a basic dependency list? I understand that a perfect solution would likely be a lot of work, but a partial solution would be good enough for me. So long as it takes care of the bulk of the tedious work, I don't mind dealing with the handful of remaining corner cases by hand.
PyCharm seems able to at least compare the package names in requirements.txt to my imports. Unfortunately this doesn't work for the poetry dependencies, not even with the Poetry Pycharm Plugin installed.

Type hints for lxml?

New to Python and come from a statically typed language background. I want type hints for https://lxml.de just for ease of development (mypy flagging issues and suggesting methods would be nice!)
To my knowledge, this is a python 2.0 module and doesn’t have types. Currently I’ve used https://mypy.readthedocs.io/en/stable/stubgen.html to create stub type definitions and filling in “any”-types I’m using with more information, but it’s really hacky. Are there any safer ways to get type hints?
There is an official stubs package for lxml now called lxml-stubs:
$ pip install lxml-stubs
Note, however, that the stubs are still in development and are not 100% complete yet (although very much usable from my experience). These stubs were once part of typeshed, then curated by Jelle Zijlstra after removal and now are developed as part of the lxml project.
If you want the development version of the stubs, install via
$ pip install git+https://github.com/lxml/lxml-stubs.git
(the project's readme installation command is missing the git+ prefix in URL's scheme and won't work).
Recently I have done much more gap filling based on lxml-stubs with some good progress.
Welcome to check out types-lxml if any late comer are still interested. For most people I think lxml.objectify is the only missing piece lacking from the stubs, which is planned immediately after current release.
I’ve used stubgen to create stub type definitions and filling in “any”-types
This is actually the correct approach if it's not lxml; creating template from mypy stubgen is the starting point for many stub files. But lxml is mostly written in Cython, for which stubgen do not have perfect support yet. Besides as OP noted, this is a python 2.0 era module, and author uses function arguments in a quite polymorphous way. There are lots of unique challenges annotating lxml, as lxml is essentially a python interface for libxml and libxslt in its core.
As an example, the support of both unicode and bytes input complicates matter too; this is the same difficulty found when annotating xml.etree bundled with python, but in a much greater magnitude.
I would not call this "hacky", rather it is gradual typing.
You can take a closer look at lxml-stubs repository. From about:
This repository contains external type annotations (see PEP 484) for the lxml package. Such type annotations are normally included in typeshed, but lxml's annotations were frequently problematic and have therefore been deleted from typeshed. In particular, the stubs are incomplete and it has been difficult to provide complete stubs.
Perhaps it will be useful to you

concurrent.futures with multinest

What is the best python based multinest package that optimizes for multi processing with concurrent.futures?
I've had issues getting multicast to use all of my CPUs with anything but multiprocessing.pool; but the python multinest operations seem to not be able to use that.
On the github issues section for dynesty (one of the two most common, pure-python MultiNest), we discussed this is as well
https://github.com/joshspeagle/dynesty/issues/100
There was not a very settled, final explanation, but the thought is that
(1) The cost function is not large enough to require all of the cores at once
(2) The bootstrap flag should be set to 0 to avoid bootstrapping; it's a trick implemented for speed that seems to be interfering.
I've used Nestle (github.com/kbarbary/nestle) and Dynesty (github.com/joshspeagle/dynesty); they both seem to have this problem no matter the complexity of the cost function.
I have had great success using PyMultiNest (github.com/JohannesBuchner/PyMultiNest); but it requires the fortran version of MultiNest (github.com/JohannesBuchner/MultiNest), which is very difficult to install correctly -- need to manually install OpenMPI. Both MultiNest and OpenMPI can have compiler issues depending on the OS, system, and configuration thereof.
I would suggest using PyMultiNest, except that it's so hard to install; Using Dynesty and Nestle are trivial; but they have had this issue with full parallelizations.

How do I disable version parsing in cabal or stack?

I am using alternative version numbering approach for my projects. I have encountered strange behavior by cabal and stack that does not allow me to fully enjoy benefits of this approach. Both cabal and stack enforce version to be of format Int.Int.Int, which does not cover the case of another version format I use for branches (0.x.x, 1.x.x, 1.0.x, etc).
If I have line version: 0.x.x in my .cabal file, I am getting Parse of field 'version' failed. error when running cabal build or Unable to parse cabal file {PROJECT_NAME}.cabal: NoParse "version" 5 when running stack init.
Is there a way to disable version parsing on cabal and stack commands? Is there a flag for it? Or do I have to request this kind of change (adding flags, disabling version parsing) from the developers of cabal and stack?
Why is there any parsing at all? How does it help with building a package? Does cabal or stack automatically increment build numbers on some event? If yes, where could I read more about this? How could I influence the way version numbering incrementation gets implemented in cabal and stack? I want developers of haskell packages take into account the possibility of alternative version numbering approaches.
PS. For all interested folks, I want to quickly summarize the idea behind "weird" version numbers, such as 0.x.x, 1.x.x, 1.0.x. I use the version numbers with x's to describe streamlines of development that allow code changes while such version numbers as 1.0.0, 1.1.0, 2.35.46 are used to describe frozen states of development (to be precise, they are used for released versions of software). Note that such version numbers as 0.x.0, 1.x.15, 2.x.23 are also possible (used for snapshots/builds of software) and they mean that codebase has been inherited from branches with version numbers 0.x.x, 1.x.x and 2.x.x correspondingly.
Why do I need such version numbers as 0.x.x, 1.x.x and 2.x.x at all? In brief, different number of x's mean branches of different types. For example, version number pattern N.x.x is used for support branches, while pattern N.M.x is used for release branches. Idea behind support branches is that they get created due to incompatibility of the corresponding codebases. Release branches get created due to feature freeze in corresponding codebase. For example, branches 1.0.x, 1.1.x, 1.2.x, ... get created as a result of feature freezes (or releases) in branch 1.x.x.
I know this is all confusing, but I worked hard to establish this version numbering approach and I continue working on awareness about the inconsistencies of version numbering through my presentations and other projects. This all makes sense once you think more about the pitfalls of semver approach (you can find detailed slideshare presentation on the matter following the link). But I do not want to defend it for now. For the time being, I just want cabal and stack to stop enforcing their, as I perceive them, unjustified rules to my project. Hope you can help me with that.
You can't. The version will be parsed to Version, which is:
data Version = PV0 {-# UNPACK #-} !Word64
| PV1 !Int [Int]
Stack uses Cabal as a library but has its own Version type:
newtype Version =
Version {unVersion :: Vector Word}
deriving (Eq,Ord,Typeable,Data,Generic,Store,NFData)
Neither cabal nor stack have a way to customize the parsing. You have to write your own variant of those programs if you want to use another version type. But then again, you're not winning anything at that point: neither Hackage nor Stackage will recognize your package's version.
So the 1.x.x isn't possible at the moment. You could exchange x with 99999999 or something similar to mitigate the problem. That being said, it's not clear what cabal install should then install. The 99999999 version? Or the latest stable variant?
If you can express the semantics, a discussion on the mailing list as well as a feature request might change the behaviour in the (far away) future, but for now, you either have to patch the programs yourself or use another numbering scheme.
Is there a way to disable version parsing on cabal and stack commands? Is there a flag for it?
No.
Or do I have to request this kind of change (adding flags, disabling version parsing) from the developers of cabal and stack?
You can of course ask, but there are so many outstanding issues that you are unlikely to get any traction. You will have to be very convincing -- convincing enough to overturn more than 20 years of experience that says the current versioning scheme is basically workable. Realistically, if you want this to happen you'll probably have to maintain a fork of these tools yourself, and provide an alternative place to host packages using this scheme.
Why is there any parsing at all? How does it help with building a package?
Packages specify dependencies, and for each dependency, specify what version ranges they work with. The build tools then use a constraint solver to choose a coherent set of package/version pairs to satisfy all the (transitive) dependencies. To do this, they must at a minimum be able to check whether a given version is in a given range -- which requires parsing the version number at least a little bit.
Does cabal or stack automatically increment build numbers on some event? If yes, where could I read more about this?
There is nothing automatic. But you should take a look at the Package Version Policy, which serves as a social contract between package maintainers. It lets one package maintainer say, "I am using bytestring version 0.10.0.1 and it seems to work. I'm being careful about qualifying all my bytestring imports; therefore I can specify a range like >=0.10 && <0.11 and be sure that things will just work, while giving the bytestring maintainer the ability to push security and efficiency updates to my users." without having to pore through the full documentation of bytestring and hope its maintainer had written about what his version numbers mean.
How could I influence the way version numbering incrementation gets implemented in cabal and stack?
As with your previous question about changing the way the community does things, I think modifications to the Package Versioning Policy are going to be quite difficult, especially changes as radical as you seem to be proposing here. The more radical the change, the more carefully motivated it will have to be to gain traction.
I honestly don't know what a reasonable place to take such motivation and discussion would be; perhaps the haskell-cafe mailing list or similar.

How to Be Python 3 Ready?

What are the current rules for writing python code that will pass cleanly through 2to3 and what are the practices that seem to be best suited to writing code that will not become mired forever in version 2.
I have read from the SciPy/NumPy forums that "100% test coverage" (unit testing) is important for many people, and I am not sure if that would apply to everybody. Certainly having a reasonable set of unit tests to try your code out with after conversion, seems a sane step.
Are there other things? What are skilled Pythonistas doing if they are writing 2.x code that they hope to have come through "cleanly" in the 2to3 process.
I am looking for specific instances of "[don't] do this" as well as some more general "best-practices", but specific instances of "do's and don'ts" are helpful.
Let's assume that frameworks, libraries (Django, SciPy/NumPy), and every other C Extension we need gets ported to Python3 eventually, and I'm asking about how you write and maintain the pure python language code that you write yourself.
Update: It's possible that what I really want is the "style guide" and list of deprecated features that everybody was already staying away from. I cut my teeth on Python 1.5 and moved to 2.0, and then have not really followed much of the 2.5/2.6 era, used them but really my code is more 2.1 era.
I'd say:
Read the "What's new for Python 3.0". Very informative.
In particular, if you care about Unicode or text encodings at all, take the time to understand what has changed for 3.x. That's probably one of the trickier things to change for Python 3.x.
Get Python 2.6 or 2.7, and run your code with the -3 flag. It will tell you about things in your code that will need changing.
Before using 3rd-party packages, check to see if they have a Python 3.x version. If not, check the package web site, mailing lists, version control repositories etc to see how actively the package is being developed and whether there is a roadmap towards Python 3.x support.
Download Python 3.x and try it out! Admittedly, that might not be practical if you care about code that currently depends on packages that don't yet support Python 3.x (e.g. wxPython or Django).

Resources