Managing non-code resources in PyOxidizer - python-3.x

I am trying to build an executable from Python (3.9) source in PyOxidizer.
One of the packages uses importlib.resources to access non-source code files in a subdirectory of itself.
package_root/
resource_dir/
resource.file
__init__.py
resource_user.py
This is working fine in the python interpreter, but dies in the executable built by PyOxidizer.
The code in resource_user.py is using importlib.resources thus:
from importlib import resources
...
def _get_resource(package, resource):
...
path = resources.files(package)
...
loaded_resource = _get_resource(__package__, "resource.file")
The executable dies on the call to resources.files().
I'm hoping this is a tweak to the python_executable or policy objects in pyoxidizer.bzl, but I can't see what it is. Preferably, I want the resources included within the in-memory packages, rather than being sideloaded adjacent files.

Related

Relative imports within a git repo

I want to create a git repo that can be used like this:
git clone $PROJECT_URL my_project
cd my_project
python some_dir/some_script.py
And I want some_dir/some_script.py to import from another_dir/some_module.py.
How can I accomplish this?
Some desired requirements, in order of decreasing importance to me:
No sys.path modifications from within any of the .py files. This leads to fragility when doing IDE-powered automated refactoring.
No directory structure changes. The repo has been thoughtfully structured.
No changes to my environment. I don't want to add a hard-coded path to my $PYTHONPATH for instance, as that can result in unexpected behavior when I cd to other directories and launch unrelated python commands.
Minimal changes to the sequence of 3 commands above. I don't want a complicated workflow, I want to use tab-completion for some_dir/some_script.py, and I don't want to spend keystrokes on extra python cmdline flags.
I see four solutions to my general problem described here, but none of them meet all of the above requirements.
If no solution is possible, then why are things this way? This seems like such a natural want, and the requirements I list seem perfectly reasonable. I'm aware of a religious argument in a 2007 email from Guido:
I'm -1 on this and on any other proposed twiddlings of the __main__
machinery. The only use case seems to be running scripts that happen
to be living inside a module's directory, which I've always seen as an
antipattern. To make me change my mind you'd have to convince me that
it isn't.
But not sure if things have changed since then.
Opinions haven't changed on this topic since Guido's 2007 comment. If anything, we're moving even further in the opposite direction, with the additions of PYTHONSAFEPATH var and corresponding -P option in 3.11:
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONSAFEPATH
https://docs.python.org/3/using/cmdline.html#cmdoption-P
These options will nerf direct sibling module imports too, requiring sys.path to be explicitly configured even for scripts!
So, scripts still can't easily do relative imports, and executable scripts living within a package structure are still considered an anti-pattern. What to do instead?! The widely accepted alternative here is to use the packaging feature of entry-points. One type of entry-point group in packaging metadata is the "console_scripts" group, used to point to arbitrary callables defined within your package code. If you add entries in this group within your package metadata, then script wrappers for those callables will be auto-generated and put somewhere on $PATH at pip install time). No hacking of sys.path necessary.
That being said, it's still possible to run .py files directly as scripts, provided you've configured the underlying Python environment for them to resolve their dependencies (imports) correctly. To do that, you'll want to define a package structure and "install" the package so that your source code is visible on sys.path.
Here's a minimum example:
my_project
├── another_dir
│ ├── __init__.py <-- __init__ file required for package dirs (it can be empty)
│ └── some_module.py
├── pyproject.toml <-- packaging metadata lives here
└── some_dir <-- no __init__ file necessary for non-packaged subdirs
└── some_script.py
Minimal contents of the packaging definition in pyproject.toml:
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[project]
name = "my_proj"
version = "0.1"
[tool.setuptools.packages.find]
namespaces = false
An additional once-off step is required to create/configure an environment in between the git clone and the script execution:
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
This makes sure that another_dir is available to import from the environment's site-packages directory, which is already one of the locations on sys.path (check with python -m site). That's what's required for any/all of these import statements to work from within the script file(s)
from another_dir import some_module
import another_dir.some_module
from another_dir.some_module import something
Note that this does not necessarily put the parent of another_dir onto sys.path directly. For an editable install, it will setup some scaffolding which makes your package appear to be "installed" in the site, which is sufficient for those imports to succeed. For a non-editable install (pip install without the -e flag), it will just copy your package directly into the site, compile the .pyc files, and then the code will be found by the normal SourceFileLoader.

Souce code getting packaged in python wheel

We are using the wheels to deploy our code to QA/Production. Recently we found/realized that wheel packages are actually storing our source code. And by simple command as below will open all the source code inside it.
unzip package.whl
command used for wheel creation is as below
cd /path/to/source/code/folder
python setup.py bdist bdist_wheel
So,
Is there any way to create wheels which creates binary and stores in package rather than source code?
In the simplest sense, wheel is just:
a zip file
with a specific filename
and a specific directory layout
containing pure-Python source code
and any platform-specific binaries
This means that a wheel (and any other distribution) is not a binary itself, but it may contain platform-specific binaries -- for example, if you are building/compiling some C code along with your Python package.
Most wheels are pure-Python, which means that they only contain Python source code.
It seems like you're asking how to "compile" Python code into an obfuscated binary. This is not the goal of a wheel. You might want to read more details on the wheel format here: https://www.python.org/dev/peps/pep-0427/
Is there any way to create wheels which creates binary and stores in package rather than source code?
Not with the wheel format. If this is actually your goal, you may want to look into pyinstaller, py2exe or cython, depending on the target platform.
In a case someone stumbles here the same way I did. If you
Use Cython to pre-compile your library.
Have not only .pyx, but also .py modules. For example you want to do this with some existing project without any modifications (except for setup.py), or consider it unreasonable to do in the first place as Cython consumes .py files also).
Want to distribute pre-compiled library without any .py files included (except for may be empty __init__.py files).
Then, you can apply the following (quite dirty) solution to exclude any files you want from the wheel:
from wheel.bdist_wheel import bdist_wheel
class CommandBdistWheel(bdist_wheel):
# Called almost exactly before filling `.whl` archive
def write_wheelfile(self, *args, **kwargs):
dr = f'{self.bdist_dir}/<package name>'
paths = [
path for path in glob.glob(f'{dr}/**/*.py', recursive=True)
if os.path.basename(path) != '__init__.py'
]
for path in paths:
os.remove(path)
super().write_wheelfile(*args, **kwargs)
setup(
# ...
cmdclass={'bdist_wheel': CommandBdistWheel},
# ...
)

Copy non python files via package_data to Scripts directory

I have some scripts in my package, that rely on some template xml files.
Those scripts are callable by entry points and I wanted to reference the template files by a relative path.
When calling the script via python -m ... the scripts themselves are called from within lib\site-packages and there the xml files are available as I put them in my setup.py like this:
setup(
...
packages=['my_pck'],
package_dir={'my_pck': 'python/src/my_pck'},
package_data={'my_pck': ['reports/templates/*.xml']},
...
)
I know, I could copy those templates also by using data_files in my setup.py but using package_data seems better to me.
Unfortunately package_data seems not to copy those files to the Scripts folder where the entry points are located.
So my question is, is this even achievable via package_data and if, how?
Or is there a more pythonic, easier way to achieve this? Maybe not referencing those files via paths relative to the scripts?
Looks like importlib-resources might help here. This library is able to find the actual path to a resource file packaged as package_data by setuptools.
Access the package_data files from your code with something like this:
with importlib_resources.path('my_pck.reports.templates', 'a.xml') as xml_path:
do_something(xml_path)

How do I recursively generate documentation of an entire project with pydoc?

I have a python project inside a specific folder named "Project 1". I want to extract all the docstrings of all the python files inside this project.
In this project all the modules are imported dynamically through __init__.py and, for that reason, when I run pydoc it fails on the imports.
python -m pydoc -w module_folder/ will work for some scenarios, but not all. For example, if you want to document modules and submodules of an installed package, it won't work, you'd need to pivot to a different tool.
Using your favorite language you will need to:
Iterate through files in your target folder
Call pydoc once per (sub)module
Here is one of many examples on Github.
Pdoc, pydoctor both handle walking folders automatically, my fork of pydoc walks the module dependency tree by default.

Packaging Multiple Python Files

I currently am using this guide to package up my project wasp. However currently everything lives inside of the wasp file.
That's not ideal. I would rather have all the classes in separate files so it can be more effectively managed. I have the series of files needed in the debian directory. But I'm not sure how to configure the packaging to package multiple files.
Is there a way to change my packaging to package more than just the one script file?
I'm not a debian package or Python expert, but one way would be to copy the various source files to another location (outside of /usr/bin), and then have /usr/bin/wasp call out to them.
Say you put all of your python code in src/ in the root of your repo. In the debian/install file, you'd have:
wasp usr/bin
src/* usr/lib/wasp/
You'd then just need /usr/bin/wasp to call some entry point in src. For example,
#!/usr/bin/python3
import sys
sys.path.append('/usr/lib/wasp/')
import wasp # or whatever you expose in src
# ...
Again, I don't know the best practices here (either in directory or python usage) but I think this would at least work!

Resources