I want to create a git repo that can be used like this:
git clone $PROJECT_URL my_project
cd my_project
python some_dir/some_script.py
And I want some_dir/some_script.py to import from another_dir/some_module.py.
How can I accomplish this?
Some desired requirements, in order of decreasing importance to me:
No sys.path modifications from within any of the .py files. This leads to fragility when doing IDE-powered automated refactoring.
No directory structure changes. The repo has been thoughtfully structured.
No changes to my environment. I don't want to add a hard-coded path to my $PYTHONPATH for instance, as that can result in unexpected behavior when I cd to other directories and launch unrelated python commands.
Minimal changes to the sequence of 3 commands above. I don't want a complicated workflow, I want to use tab-completion for some_dir/some_script.py, and I don't want to spend keystrokes on extra python cmdline flags.
I see four solutions to my general problem described here, but none of them meet all of the above requirements.
If no solution is possible, then why are things this way? This seems like such a natural want, and the requirements I list seem perfectly reasonable. I'm aware of a religious argument in a 2007 email from Guido:
I'm -1 on this and on any other proposed twiddlings of the __main__
machinery. The only use case seems to be running scripts that happen
to be living inside a module's directory, which I've always seen as an
antipattern. To make me change my mind you'd have to convince me that
it isn't.
But not sure if things have changed since then.
Opinions haven't changed on this topic since Guido's 2007 comment. If anything, we're moving even further in the opposite direction, with the additions of PYTHONSAFEPATH var and corresponding -P option in 3.11:
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONSAFEPATH
https://docs.python.org/3/using/cmdline.html#cmdoption-P
These options will nerf direct sibling module imports too, requiring sys.path to be explicitly configured even for scripts!
So, scripts still can't easily do relative imports, and executable scripts living within a package structure are still considered an anti-pattern. What to do instead?! The widely accepted alternative here is to use the packaging feature of entry-points. One type of entry-point group in packaging metadata is the "console_scripts" group, used to point to arbitrary callables defined within your package code. If you add entries in this group within your package metadata, then script wrappers for those callables will be auto-generated and put somewhere on $PATH at pip install time). No hacking of sys.path necessary.
That being said, it's still possible to run .py files directly as scripts, provided you've configured the underlying Python environment for them to resolve their dependencies (imports) correctly. To do that, you'll want to define a package structure and "install" the package so that your source code is visible on sys.path.
Here's a minimum example:
my_project
├── another_dir
│ ├── __init__.py <-- __init__ file required for package dirs (it can be empty)
│ └── some_module.py
├── pyproject.toml <-- packaging metadata lives here
└── some_dir <-- no __init__ file necessary for non-packaged subdirs
└── some_script.py
Minimal contents of the packaging definition in pyproject.toml:
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[project]
name = "my_proj"
version = "0.1"
[tool.setuptools.packages.find]
namespaces = false
An additional once-off step is required to create/configure an environment in between the git clone and the script execution:
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
This makes sure that another_dir is available to import from the environment's site-packages directory, which is already one of the locations on sys.path (check with python -m site). That's what's required for any/all of these import statements to work from within the script file(s)
from another_dir import some_module
import another_dir.some_module
from another_dir.some_module import something
Note that this does not necessarily put the parent of another_dir onto sys.path directly. For an editable install, it will setup some scaffolding which makes your package appear to be "installed" in the site, which is sufficient for those imports to succeed. For a non-editable install (pip install without the -e flag), it will just copy your package directly into the site, compile the .pyc files, and then the code will be found by the normal SourceFileLoader.
Related
For a set of programs written in most languages (C for instance) a script can normally run those programs without any sort of interference between dynamic link libraries and with no special hand holding so long as they are all found on PATH. That is, the following will work:
#!/bin/bash
prog1
prog2
prog3
However, if these three programs are written in Python and they import conflicting package versions then to run each one successfully it must either be installed into a virtualenv or each must have a separate site-packages directory which is referenced by PYTHONPATH. Either way they need a set up and possibly a tear down before running. That is, for virtualenv:
#!/bin/bash
source $PROG1_ROOT/bin/activate
prog1
deactivate
source $PROG2_ROOT/bin/activate
prog2
deactivate
source $PROG3_ROOT/bin/activate
prog3
deactivate
and for separate site-packages:
#!/bin/bash
export PYTHONPATH=$PROG1_ROOT/lib/python3.6/site-packages
prog1
export PYTHONPATH=$PROG2_ROOT/lib/python3.6/site-packages
prog2
export PYTHONPATH=$PROG3_ROOT/lib/python3.6/site-packages
prog3
This problem results because
import pkg_resources
(at least through Python3.6) cannot reliably import the proper versions when multiple versions of a package share the same site-package directory, even if __requires__ precedes it listing all the version restrictions.
It occurs to me that if PYTHONPATH, or some equivalent, could be specified relative to the program instead of the $PWD, and some consistency in directory layout was observed, then it would only have to be set once. That is, if prog1 is in $PROG1_ROOT/bin and its libraries are in $PROG1_ROOT/lib/python3.6/site-packages, then setting PYTHONPATH to "../lib/python3.6/site-packages" would work not only for prog1, but also for prog2, prog3, and for as many more as are needed through progN.
However, PYTHONPATH is normally provided as an absolute path, and relative paths are I believe with respect to $PWD, not to the python program (prog1). Is there some other Python path variable which has the desired property? Failing that, is there some type of file which could be dropped into $PROG1_ROOT/bin which would be normally picked up by a python program when it starts and which could direct it to use $PROG1_ROOT/lib/python3.6/site-packages? It would be OK to have either the relative or absolute path in that file, although the former would still be preferred because then one could move the entire PROG1_ROOT directory tree to another location in the file system without having to rewrite this special file. I really want to avoid solutions which would require modifying prog1 etc. themselves (ie, prog1 in the example).
Thanks.
EDITED:
I wrote this:
https://sourceforge.net/projects/python-devirtualizer/
to implement some of these ideas. At this point it is Linux (or at least POSIX) specific. It slightly modifies python scripts in a package's "bin" directory by changing the first line, and it "wraps" everything in that directory with a replacement native binary which injects a custom PYTHONPATH into the true target's environment. That binary looks up its location using a function from libSDL2 and then specifies the PYTHONPATH relative to that. So far it has worked pretty well, and the "programs" in installed python packages (the "bin" directory's contents) are run based on PATH just like any other program, no futzing about with PYTHONPATH in the shell.
Making search paths relative to the executable is a Very Bad Idea (TM). Move the executable or libraries around, all hell breaks loose. Some enterprising miscreant might notice the path settings and place a script just right to get their own doctored libraries (or just flawed old versions) to be used. And so on.
Clean up the misbehaving scripts. Chances are that by using old versions they are vulnerable to by now fixed security boo-boos, or other misbehaviours. Or find a way to load the stuff in the script itself.
I have a Git repository which (among other things) holds Airflow DAGs in airflow directory. I have a clone of the repository besides an install directory of Airflow. airflow directory in Git is pointed to by AIRFLOW_HOME configuration variable.
I would like to allow imports from modules in the repository that are listed outside airflow folder (please see the structure below).
<repo root>
|_airflow
|_dags
|_dag.py
|_module1
|_module2
|_...
So that in dag.py I can do:
from module1 import Module1
Currently, it does not seem possible without tricks like editing sys.path explicitly which is not very elegant and has to be done in each of the dag source files...
Making an installable package out of the module1 is also out of the question.
Re-writing conclusion from discussions here
Broadly, there are 2 possible ways
Package your code into an Airflow plugin
Make your code discoverable to dag-definition-file(s) parsing processes by updating PYTHONPATH. Here again we have following options
(a) Update PYTHONPATH on system level using bashrc / equivalent (once-and-for-all) or just export the updated PYTHONPATH for current bash session
(b) Programmatically update sys.path in the beginning of DAG-definition file
I have a python project inside a specific folder named "Project 1". I want to extract all the docstrings of all the python files inside this project.
In this project all the modules are imported dynamically through __init__.py and, for that reason, when I run pydoc it fails on the imports.
python -m pydoc -w module_folder/ will work for some scenarios, but not all. For example, if you want to document modules and submodules of an installed package, it won't work, you'd need to pivot to a different tool.
Using your favorite language you will need to:
Iterate through files in your target folder
Call pydoc once per (sub)module
Here is one of many examples on Github.
Pdoc, pydoctor both handle walking folders automatically, my fork of pydoc walks the module dependency tree by default.
I am taking a close look at Scons and something smells. SCons uses SConstruct files as base configuration file. This configuration file is a Python file but:
It does not have the .py extension
It does not have any import directives
It is not possible to have auto-completion from IDEs
It it possible to use a variant of the SConstruct file where I could find something like the following?
# build.py
import scons
env = scons.Environment()
env.Program('foo')
It would be not simple (but possible) to do what you're asking. SConscripts are plain python, however the globals available in the context of SConstruct or SConscripts are carefully constructed.
Any user can add methods and also pass python objects into the SConscripts via Export() or exports (in a SConscript call).
That said try:
from SCons.Script import *
That should get you some of what you're looking for.
The fact that Pycharm cannot find the symbols in question doesn't mean the it is not a plain python file.
Additionally I'm not sure how the subject of your question relates to the contents of your question. Typically setup.py is a file to build packages ala setuptools and install via pip (or similar).
Sure you can build whatever you are trying to build with setuptools, it will likely be harder to do, but that said if you get it to work, perhaps easier to upload and distribute via pypi
p.s. It's SCons not Scons.
I currently am using this guide to package up my project wasp. However currently everything lives inside of the wasp file.
That's not ideal. I would rather have all the classes in separate files so it can be more effectively managed. I have the series of files needed in the debian directory. But I'm not sure how to configure the packaging to package multiple files.
Is there a way to change my packaging to package more than just the one script file?
I'm not a debian package or Python expert, but one way would be to copy the various source files to another location (outside of /usr/bin), and then have /usr/bin/wasp call out to them.
Say you put all of your python code in src/ in the root of your repo. In the debian/install file, you'd have:
wasp usr/bin
src/* usr/lib/wasp/
You'd then just need /usr/bin/wasp to call some entry point in src. For example,
#!/usr/bin/python3
import sys
sys.path.append('/usr/lib/wasp/')
import wasp # or whatever you expose in src
# ...
Again, I don't know the best practices here (either in directory or python usage) but I think this would at least work!