Type hints for lxml? - python-3.x

New to Python and come from a statically typed language background. I want type hints for https://lxml.de just for ease of development (mypy flagging issues and suggesting methods would be nice!)
To my knowledge, this is a python 2.0 module and doesn’t have types. Currently I’ve used https://mypy.readthedocs.io/en/stable/stubgen.html to create stub type definitions and filling in “any”-types I’m using with more information, but it’s really hacky. Are there any safer ways to get type hints?

There is an official stubs package for lxml now called lxml-stubs:
$ pip install lxml-stubs
Note, however, that the stubs are still in development and are not 100% complete yet (although very much usable from my experience). These stubs were once part of typeshed, then curated by Jelle Zijlstra after removal and now are developed as part of the lxml project.
If you want the development version of the stubs, install via
$ pip install git+https://github.com/lxml/lxml-stubs.git
(the project's readme installation command is missing the git+ prefix in URL's scheme and won't work).

Recently I have done much more gap filling based on lxml-stubs with some good progress.
Welcome to check out types-lxml if any late comer are still interested. For most people I think lxml.objectify is the only missing piece lacking from the stubs, which is planned immediately after current release.
I’ve used stubgen to create stub type definitions and filling in “any”-types
This is actually the correct approach if it's not lxml; creating template from mypy stubgen is the starting point for many stub files. But lxml is mostly written in Cython, for which stubgen do not have perfect support yet. Besides as OP noted, this is a python 2.0 era module, and author uses function arguments in a quite polymorphous way. There are lots of unique challenges annotating lxml, as lxml is essentially a python interface for libxml and libxslt in its core.
As an example, the support of both unicode and bytes input complicates matter too; this is the same difficulty found when annotating xml.etree bundled with python, but in a much greater magnitude.

I would not call this "hacky", rather it is gradual typing.
You can take a closer look at lxml-stubs repository. From about:
This repository contains external type annotations (see PEP 484) for the lxml package. Such type annotations are normally included in typeshed, but lxml's annotations were frequently problematic and have therefore been deleted from typeshed. In particular, the stubs are incomplete and it has been difficult to provide complete stubs.
Perhaps it will be useful to you

Related

How do indicate that a Haskell package is in either an alpha/beta/release candidate stage?

Let us say that I have worked on a haskell library and am now ready to release a beta version of the software to hackage/make repo public on github etc.
Possible Solutions and why they do not work for me
Use packagename-0.0.0.1-alpha or similar.
The problem here is quite simple: The Haskell PVP Specification does not allow it: (bold is me)
The components of the version number MUST be numbers! Historically Cabal supported version numbers with string tags at the end, e.g. 1.0-beta This proved not to work well because the ordering for tags was not well defined. Version tags are no longer supported and mostly ignored, however some tools will fail in some circumstances if they encounter them.
Just use packagename-0.* until it is out of alpha/beta (and then use packagename-1.*).
The problem here is twofold:
This method would not work for describing relase candidates which are post version 1.
Programmers from other ecosystems, such as that of rust, where it is quite common to have a stable library in 0.*, might wrongly assume that this library is stable. (Of course, it could be mitigated somewhat with a warning in the README, but I would prefer a better solution still.)
So, what is the best (and most conventional in haskell) way to indicate that the library version is in alpha/beta stage of development or is a release candidate?
As far as I know, there is not a package-wide way to say this. However, you can add a module description that describes the stability of the module to each of your modules' documentation.
{-|
Stability: experimental
-}
module PackageName.ModuleName where

Support several versions of a package if version change breaks code

I have a Python3 project which uses Biopython package. One of its modules got removed in the latest version so I have to change a small piece of code to support this change. On the other hand this change would break my code for all "old" version of Biopython (which are heavily used on productive systems).
My questions:
What is the proper way to deal with this?
If this makes sense: How do I support old and new package versions at the same time? Do I perform a run time check to see which version I have an then run different code? Or is this a bad idea? If you think this is the way to go: Is there a standard way to do this?
The simplest way to ensure a specific version is present is to pin that version in your requirements.txt file (or other dependency specifications). There are plenty of systems which rely on legacy versions of packages, and especially for a package without any security implications this is totally reasonable.
If supporting multiple versions is your goal, you could perform some basic checks during your package import process, in an __init__.py file or elsewhere. This pattern is somewhat common, especially useful for version compatibility between Python 2 & 3:
def foo_function():
return
try:
import biopython.foo as foo
except (ImportError, AttributeError):
foo = foo_function
foo()
I have seen this countless times in the wild on GitHub--of course now that I try to find an example I cannot--but I will update this answer with an example when I do.
EDIT: If it's good enough for Numpy, it's probably good enough for the rest of us. numpy_base.pyi L7-13

Haskell, Hackage, GHC and productivity. How to solve a real example?

I don't know the best way to solve a simple (probably) problems (hackage related).
I asked for help about it (http://stackoverflow.com/questions/12841599/haskell-hackage-ghc-and-productivity-what-to-do) but I knew not explain well.
Today, I'm with a this kin problem.
The concrete problem isn't relevant, but is it:
`Write a function that, given a string, remove diacritics.`
Example:
`simpleWord "Cigüeñal" <-> "Ciguenal"
The correct way (I think) is to use the standard Unicode normalization. In some languages/frameworks (.Net, PHP, Python, ...) exist some related function.
In Haskell, thanks to hackage community exist too:
`Text.Unicode.Normalization.normalize`
But, I couldn't install with (eg) ghc-7.4 but compact-string (that depends of) fail.
A fix for compact-string exists (compact-string-fix) then: can't I use cabal to install (directly)?, should I download and patch it?, should I look for another alternative to function about?
I explained a concrete real case (simple or complex, don't care), the question (that I ask help for) is how can, a novice haskeller, know the best way to select correct libraries, ghc correct (balanced) version, without hit a wall.
I'm really lost about it.
Really, thank you very much for any suggestion.
Best regards.
The documentation for compact-string says, "This package is obsolete. Use text instead.".
The documentation for text says, "To use an extended and very rich family of functions for working with Unicode text (including normalization, regular expressions, non-standard encodings, text breaking, and locales), see the text-icu package.".
The documentation for text-icu shows that it successfully builds on GHC 7.4 and has support for Unicode normalization.
Here's the general process I follow when deciding which packages to use. First, I try to identify multiple packages that meet my needs. Then I look more closely at each package to try to determine which ones are the best for me, according to the criteria listed below.
It's usually better to use packages that are currently maintained. To determine if a package is currently maintained, I check the "Upload date" link on the package description page. (Of course, there are some old tried-and-true packages that haven't been modified in ages because they don't need modification.)
It's usually better to use packages that are mature, so I check the version number on the package description page. A package with a version number of 7.3.5 is probably more mature than a version 0.1 package.
It's usually better to use packages that are well documented. Sometimes there's a nice example of how to use the package in the Haddock documentation (yay!). I'll also check the "Home page" link on the package description page, because often there will be more documentation there.
It's usually better to use packages that are popular, because any problems will probably be addressed quickly, and other users can answer questions. I'll usually do a Google search and see whch packages are mentioned most often on Haskell mailing lists and StackOverflow.
It's usually better to use packages that don't require a lot of packages I don't already have, so I check the "Dependencies" section on the package description page.
I tend to follow this procedure when choosing a package for any programming language, not just Haskell.

How to Be Python 3 Ready?

What are the current rules for writing python code that will pass cleanly through 2to3 and what are the practices that seem to be best suited to writing code that will not become mired forever in version 2.
I have read from the SciPy/NumPy forums that "100% test coverage" (unit testing) is important for many people, and I am not sure if that would apply to everybody. Certainly having a reasonable set of unit tests to try your code out with after conversion, seems a sane step.
Are there other things? What are skilled Pythonistas doing if they are writing 2.x code that they hope to have come through "cleanly" in the 2to3 process.
I am looking for specific instances of "[don't] do this" as well as some more general "best-practices", but specific instances of "do's and don'ts" are helpful.
Let's assume that frameworks, libraries (Django, SciPy/NumPy), and every other C Extension we need gets ported to Python3 eventually, and I'm asking about how you write and maintain the pure python language code that you write yourself.
Update: It's possible that what I really want is the "style guide" and list of deprecated features that everybody was already staying away from. I cut my teeth on Python 1.5 and moved to 2.0, and then have not really followed much of the 2.5/2.6 era, used them but really my code is more 2.1 era.
I'd say:
Read the "What's new for Python 3.0". Very informative.
In particular, if you care about Unicode or text encodings at all, take the time to understand what has changed for 3.x. That's probably one of the trickier things to change for Python 3.x.
Get Python 2.6 or 2.7, and run your code with the -3 flag. It will tell you about things in your code that will need changing.
Before using 3rd-party packages, check to see if they have a Python 3.x version. If not, check the package web site, mailing lists, version control repositories etc to see how actively the package is being developed and whether there is a roadmap towards Python 3.x support.
Download Python 3.x and try it out! Admittedly, that might not be practical if you care about code that currently depends on packages that don't yet support Python 3.x (e.g. wxPython or Django).

The right language for OpenGL UI prototyping. Ditching Python

So, I got this idea that I'd try to prototype an experimental user interface using OpenGL and some physics. I know little about either of the topics, but am pretty experienced with programming languages such as C++, Java and C#. After some initial research, I decided on using Python (with Eclipse/PyDev) and Qt, both new to me, and now have four different topics to learn more or less simultaneously.
I've gotten quite far with both OpenGL and Python, but while Python and its ecosystem initially seemed perfect for the task, I've now discovered some serious drawbacks. Bad API documentation and lacking code completion (due to dynamic typing), having to import every module I use in every other module gets tedious when having one class per module, having to select the correct module to run the program, and having to wait 30 seconds for the program to start and obscure the IDE before being notified of many obvious typos and other mistakes. It gets really annoying really fast. Quite frankly, i don't get what all the fuzz is about. Lambda functions, list comprehensions etc. are nice and all, but there's certainly more important things.
So, unless anyone can resolve at least some of these annoyances, Python is out. C++ is out as well, for obvious reasons, and C# is out, mainly for lack of portability. This leaves Java and JOGL as an attractive option, but I'm also curious about Ruby and Groovy. I'd like your opinion on these and others though, to keep my from making the same mistake again.
The requirements are:
Keeping the hell out of my way.
Good code completion. Complete method signatures, including data types and parameter names.
Good OpenGL support.
Qt support is preferable.
Object Oriented
Suitable for RAD, prototyping
Cross-platform
Preferably Open-Source, but at least free.
It seems you aren't mainly having a problem with Python itself, but instead with the IDE.
"Bad API documentation"
To what API? Python itself, Qt or some other library you are using?
"lacking code completion (due to dynamic typing)"
As long as you are not doing anything magic, I find that PyDev is pretty darn good at figuring these things out. If it gets lost, you can always typehint by doing:
assert isinstance(myObj, MyClass)
Then, PyDev will provide you with code completion even if myObj comes from a dynamic context.
"having to import every module I use in every other module gets tedious when having one class per module"
Install PyDev Extensions, it has auto-import on the fly. Or collect all your imports in a separate module and do:
from mymodulewithallimports import *
"having to select the correct module to run the program"
In Eclipse, you can set up a default startup file, or just check "use last run configuration". Then you never have to select it again.
"before being notified of many obvious typos and other mistakes"
Install PyDev Extensions, it has more advanced syntax checking and will happily notify you about unused imports/variables, uninitialized variables etc.
Looking just at your list I'd recommend C++; especially because Code Completion is so important to you.
About Python: Although I have few experience with OpenGL programming with Python (used C++ for that), the Python community offers a number of interesting modules for OpenGL development: pyopengl, pyglew, pygpu; just to name a few.
BTW, your import issue can be resolved easily by importing the modules in the __init__.py files of the directory the modules are contained in and then just importing the "parent" module. This is not recommended but nonetheless possible.
I don't understand why nobody has heard of the D programing language?
THIS IS THE PERFECT SOLUTION!!!!
The only real alternative if you desire all those things is to use Java, but honestly you're being a bit picky about features. Is code completion really that important a trait? Everything else you've listed is traditionally very well regarded with Python, so I don't see the issue.
The text editor (not even an IDE) which I use lets you import API function definitions. Code completion is not a language feature, especially with OpenGL. Just type gl[Ctrl+I] and you'd get the options.
I tried using Java3D and java once. I realized Java3D is a typical Java API... lots of objects to do simple things, and because it's Java, that translates to a lot of code. I then moved to Jython in Eclipse to which cleaned up the code, leaving me with only the complexity of Java3D.
So in the end, I went in the opposite direction. One advantage this has over pure python is I can use Java with all of Eclipse's benefits like autocomplete and move it over to python when parts get unwieldy in Java.
It seems like Pydev can offer code completion for you in Eclipse.
I started off doing OpenGL programming with GL4Java, which got migrated to JOGL and you should definately give it (JOGL) a try. Java offers most of the features you require (plus Eclipse gives you the code completion) and especially for JOGL there are a lot of tutorials out there to get you started.
Consider Boo -- it has many of Python's advantages while adopting features from elsewhere as well, and its compile-time type inference (when variables are neither explicitly given a specific type or explicitly duck typed) allows the kind of autocompletion support you're asking about.
The Tao.OpenGL library exposes OpenGL to .NET apps (such as those Boo compiles), with explicit support for Mono.
(Personally, I'm mostly a Python developer when not doing C or Java, but couldn't care less about autocompletion... but hey, it's your question; also, the one-class-per-module convention seems like a ridiculous amount of pain you're putting yourself through needlessly).

Resources