Python: testing & loading different git commits of same package - python-3.x

I have a python package on github, and I can install different commit versions of it using e.g. pip3 install git+https://github.com/my/package#commithash. I would like to benchmark various different commits against each other, ideally comparing two versions within the same python script, so that I can plot metrics from different versions against each other. To me, the most obvious way to do this would be to install multiple different versions of the same package simultaneously, and access them using a syntax something like
import mypackage_commithash1 as p1
import mypackage_commithash2 as p2
results1 = p1.do_something()
results2 = p2.do_something()
plot_comparison(results1, results2)
But as far as I can see, python doesn't support multiple packages of the same name like this, although https://pypi.org/project/pip3-multiple-versions goes some of the way. Does anyone have any suggestions for ways to go about doing these sorts of comparison within a python script?

That's too broad of a question to give a clear answer...
Having two versions of the same project running in the same environment, same interpreter session is difficult, near impossible.
First, maybe have a look at this potentially related question:
Versions management of python packages
1. From reading your question, another solution that comes to mind would be to install the 2 versions of the project in 2 different virtual environments. Then in a 3rd virtual environment I would run code that looks like this (kind of untested pseudo-code, some tweaking will be required):
environments = [
'path/to/env1',
'path/to/env2',
]
results = []
for environment in environments:
output = subprocess.check_output(
[
environment + 'bin/python',
'-c',
'import package; print(package.do_something())',
],
)
results.append(parse_output(output))
plot_comparison(results)
2. Another approach, would be to eventually use tox to run the test program in different environments containing each a different version of the project. Then have an extra environment to run the code that would interpret and compare the results (maybe written on the filesystem?).
3. Maybe one could try to hack something together with importlib. Install the 2 versions under 2 different paths (pip install --target ...). Then in the test code, something like that:
modify sys.path to include the path containing version 1
import (maybe importlib can help)
run test 1
modify sys.path to remove the path containing version 1 and include the path containing version 2
import again (maybe importlib.reload is necessary)
run test 2
compare results

Related

How to develop Python Library in DataBricks without packaging and installing after every single change?

For simplicity, say I have 2 Python scripts. 1 is main, 1 is lib. My question is how can I test my lib in main without needing to build the lib and installing it every single time?
Single file can be done easily as answered here ( https://stackoverflow.com/a/67280018/18105234 ). What about I have nested library?
The idea is to perform development in DataBricks like in a Jupyter Lab.
There are two approaches:
Use %run (doc) to include the "library" notebook into "main" notebook. You need to re-execute that %run cell. Full example of this approach could be found in this file.
Use new functionality of Databricks Repos called arbitrary files - in this case, your library code should be in the Python file, together with corresponding __init__.py (right now you can't use notebooks), and then you include it as a "normal" Python package using import command. To automatically reload changes from package you need to use special magic commands, as it's shown in another example:
%load_ext autoreload
%autoreload 2
The 2nd approach has more advantages, as it allows to take the code, and, for example, build a library from it, or apply more code checks, that aren't possible with notebooks out of box.
P.S. My repository shows full example of how to use Databricks Repos and perform testing of the code in notebooks from CI/CD pipeline

Version of dependency vs. version of dependency of dependency

Let's say that I have dependency X version 1.0 and dependency Y version 1.0 in package.json. If Y requires X version 2.0 (which I know because I looked in package-lock.json), will I still be able to use X version 1.0 in my code without issues?
With a couple of assumptions about good module behavior, it is perfectly feasible for two modules of differing versions to be in use in the same app.
Here are some of the things a "good behavior" module must do in order to allow this:
Not use global symbols that would conflict between versions (stick to exports only). If everything is done via module exports, then two versions of the same module can run just fine. Each one will be separately imported and the code will use only the imports from the appropriate version.
Have the two versions installed in different directories or with different names. The idea here is that the path to the module must be different between the two because that's how module caching works. If the full filename is different (either because of a differing install path or a differing filename), then the module loader will happily load each one separately.
Not register anything globally that would conflict (like not both try to start a server on the same port).
Not try to write data to the same file. If all file activity is relative to the install directory for the module, then this should be safe. But, if the module assumes something about a known path or both are using a path from the same environment variables and they end up conflicting on writing data to the same file, that could cause problems.
Not try to write conflicting property names to the same object. For example if both versions were in action as Express middleware and both were trying to write different things to the req.someProp property that could cause problems. But, if both versions weren't in use on the same requests or both were being used for different functionality, then this could work just fine.
will I still be able to use X version 1.0 in my code without issues?
So, it's certainly possible, but it depends upon the behavior of the module and what exactly it does globally or with shared resources.

Freeling Python API working on sample, get Import error on other code

I'm trying out Freeling's API for python. The installation and test were ok, they provide a sample.py file that works perfectly (I've played around a little bit with it and it works).
So I was trying to use it on some other python code I have, in a different folder (I'm kind of guessing this is a path issue), but whenever I import freeling (like it shows on the sample.py):
import freeling
FREELINGDIR = "/usr/local";
DATA = FREELINGDIR+"/share/freeling/";
LANG="es";
freeling.util_init_locale("default");
I get this error:
ModuleNotFoundError: No module named 'freeling'.
The sample.py is located on the ~/Freeling-4.0/APIs/Python/ folder, while my other file is located in ~/project/, I dont know if that can be an issue.
Thank you!
A simple solution is to have a copy of freeling.py in the same directory as your code, since python will look there.
A better solution is to either paste it in one of the locations where it usually checks (like the lib folder in its install directory), or to tell it that the path where your file is should be scanned for a module.
You can check out this question to see how it can be done on Windows. You are basically just setting the PYTHONPATH environment variable, and there will only be minor differences in how to do so for other OSes. This page gives instructions that should work on Linux systems.
I like this answer since it adds the path at runtime in the script itself, doesn't make persistent changes, and is largely independent of the underlying OS (apart from the fact that you need to use the appropriate module path of course).
You need to set PYTHONPATH so python can find the modules if they are not in the same folder.

How to use certain node module

My goal is to read in graphml files and retreive information about the fields that are contained within the nodes.
I have found a node module that seems to do just that so I decided I'd give it a try before writing my own code.
Unfortunately I the documentation is pretty poor and I have only been using node.js for about three months so I cannot figure out what the few clues given mean.
Global Install
npm install -g graphml-schema-generator
To have the wizard assist you with questions, type: gschema And answer the questions until finished.
Alternatively, you can use the following syntax: gschema path/to/graphml/file path/to/out/directory The paths can be either relative or absolute. The tool should pick on it either way.
Local Install
npm install --save graphm-schema-generator
You can either type the whole path to the file each time, such as: ./node_modules/gml-to-typescript/build/index Or you can create an npm script that references the path
However you choose to go about it, the usage is exactly the same as stated in the Global Install section. Please refer to that.
If you don't want to pollute your development environment this might be a better way to install this. Then you can use npm scripts to alias it to a more manageable command.
There's no GitHub repository or what-so-ever. I have read through the code and basically there are three files of which two seem to be sourcefiles (exporting some modules that are used in the last one) and one that's called "app.ts".
I somehow expected that I could use this module like
import {<ModuleName>} from 'graphml-schema-generator';
or
require('./node_modules/graphml-schema-generator";
but this isn't the case. I do not understand what
gschema path/to/graphml/file path/to/out/directory
would mean or how it would be used. I guess there's some basic misunderstanding about packages on my side.
here's an Image of the modules hierarchy
So I want to understand how to use this module and if so, what I did wrong
Thanks in advance

Copying base environment to create a new environment in Python

I'm working with some new libraries and I'm afraid that my script might show some troubles in the future with unexpected updates of packages. So I want to create a new environment but I don't want to manually install all the basic packages like numpy, pandas, etc. So, does it makes sense to create a new environment using conda which is the exact copy of my base environment or could it create some sort of conflict?
Copying using conda works, but if you used only virtualenv, you should manually build requirements.txt, create a new virtual environment, activate it, and then simply use pip install -r requirements.txt. Note the key word - manually.
For example if you needed requests, numpy and pandas, your requirements.txt would look like this:
requests==2.20.0
numpy==1.15.2
pandas==0.23.4
You could actually exclude numpy in this case, but you still keep it as you are using it and if you removed pandas you'd still need it. I build it by installing a new package and then using pip freeze to find the module I just installed and put it into the requirements.txt with current version. Of course if I ever get to the state where I will share it with someone, I replace == with >=, most of the time that's enough, if it conflicts, you need to check what the conflicting library requires, and adjust if possible, e.g. you put in latest numpy version as requirement, but older library needs specifically x.y.z version and your library is perfectly fine with that version too (ideal case).
Anyway, this is how much you have to keep around to preserve your virtual environment, also helps if you are going to distribute your project, as anyone can drop this file into a new folder with your source and create their own environment without any hassle.
Now, this is why you should build it manually:
$ pip freeze
certifi==2018.10.15
chardet==3.0.4
idna==2.7
numpy==1.15.2
pandas==0.23.4
python-dateutil==2.7.3
pytz==2018.5
requests==2.20.0
six==1.11.0
urllib3==1.24
virtualenv==16.0.0
six? pytz? What? Other libraries use them but we don't even know what they are for unless we look it up, and they shouldn't be listed as project dependencies, they will be installed if they depend on it.
This way you ensure that there won't be too many problems only in very rare cases where one library you are using needs a new version of another library but the other library wants an ancient version of the library of which the version is conflicting and in that case it's a big mess, but normally it doesn't happen.

Resources