I'm working with some new libraries and I'm afraid that my script might show some troubles in the future with unexpected updates of packages. So I want to create a new environment but I don't want to manually install all the basic packages like numpy, pandas, etc. So, does it makes sense to create a new environment using conda which is the exact copy of my base environment or could it create some sort of conflict?
Copying using conda works, but if you used only virtualenv, you should manually build requirements.txt, create a new virtual environment, activate it, and then simply use pip install -r requirements.txt. Note the key word - manually.
For example if you needed requests, numpy and pandas, your requirements.txt would look like this:
requests==2.20.0
numpy==1.15.2
pandas==0.23.4
You could actually exclude numpy in this case, but you still keep it as you are using it and if you removed pandas you'd still need it. I build it by installing a new package and then using pip freeze to find the module I just installed and put it into the requirements.txt with current version. Of course if I ever get to the state where I will share it with someone, I replace == with >=, most of the time that's enough, if it conflicts, you need to check what the conflicting library requires, and adjust if possible, e.g. you put in latest numpy version as requirement, but older library needs specifically x.y.z version and your library is perfectly fine with that version too (ideal case).
Anyway, this is how much you have to keep around to preserve your virtual environment, also helps if you are going to distribute your project, as anyone can drop this file into a new folder with your source and create their own environment without any hassle.
Now, this is why you should build it manually:
$ pip freeze
certifi==2018.10.15
chardet==3.0.4
idna==2.7
numpy==1.15.2
pandas==0.23.4
python-dateutil==2.7.3
pytz==2018.5
requests==2.20.0
six==1.11.0
urllib3==1.24
virtualenv==16.0.0
six? pytz? What? Other libraries use them but we don't even know what they are for unless we look it up, and they shouldn't be listed as project dependencies, they will be installed if they depend on it.
This way you ensure that there won't be too many problems only in very rare cases where one library you are using needs a new version of another library but the other library wants an ancient version of the library of which the version is conflicting and in that case it's a big mess, but normally it doesn't happen.
Related
I am writing a software with PySide6. On my Mac the package has a size of 1.0GiB. Is there a way to easily reduce unnecessary files that I don't need to package.
I manually identified the files below as not necessary for my software. Still I end up with more than 500MB.
/Assistant.app
/Designer.app
/Linguist.app
/lupdate
/QtWebEngineCore
/QtWebEngineCore.framework
You can install from PyPi only the PySide6-Essentials package.
You can build from source and include via Qt installer just what you need.
P.s if you are stuggeling with building PySide from source I have a repo that might help.
I have a python package on github, and I can install different commit versions of it using e.g. pip3 install git+https://github.com/my/package#commithash. I would like to benchmark various different commits against each other, ideally comparing two versions within the same python script, so that I can plot metrics from different versions against each other. To me, the most obvious way to do this would be to install multiple different versions of the same package simultaneously, and access them using a syntax something like
import mypackage_commithash1 as p1
import mypackage_commithash2 as p2
results1 = p1.do_something()
results2 = p2.do_something()
plot_comparison(results1, results2)
But as far as I can see, python doesn't support multiple packages of the same name like this, although https://pypi.org/project/pip3-multiple-versions goes some of the way. Does anyone have any suggestions for ways to go about doing these sorts of comparison within a python script?
That's too broad of a question to give a clear answer...
Having two versions of the same project running in the same environment, same interpreter session is difficult, near impossible.
First, maybe have a look at this potentially related question:
Versions management of python packages
1. From reading your question, another solution that comes to mind would be to install the 2 versions of the project in 2 different virtual environments. Then in a 3rd virtual environment I would run code that looks like this (kind of untested pseudo-code, some tweaking will be required):
environments = [
'path/to/env1',
'path/to/env2',
]
results = []
for environment in environments:
output = subprocess.check_output(
[
environment + 'bin/python',
'-c',
'import package; print(package.do_something())',
],
)
results.append(parse_output(output))
plot_comparison(results)
2. Another approach, would be to eventually use tox to run the test program in different environments containing each a different version of the project. Then have an extra environment to run the code that would interpret and compare the results (maybe written on the filesystem?).
3. Maybe one could try to hack something together with importlib. Install the 2 versions under 2 different paths (pip install --target ...). Then in the test code, something like that:
modify sys.path to include the path containing version 1
import (maybe importlib can help)
run test 1
modify sys.path to remove the path containing version 1 and include the path containing version 2
import again (maybe importlib.reload is necessary)
run test 2
compare results
I edited Keras .optimizer and .layers modules locally, but Colab uses its own Keras & TensorFlow libraries. Uploading then using the edited libs would be rather involved per pathing and package interactions, and an overkill for a few small edits.
The closest I've got to accessing a module is keras.optimizers.__file__, which gives a relative path I don't know what to do with: '/usr/local/lib/python3.6/dist-packages/keras/optimizers.py'
Can Colab libraries be edited? Permanently (not per-runtime)?
Colab now allows direct access to system files from the GUI itself. There one can view and edit all the installed libraries like one would have done on their pc itself.
Go to the Files icon in the left Sidebar. Go to the Up Folder. From there go to the path
usr/local/lib/python3.6/dist-packages
Here, find the package and make your edit.
Then restart the runtime, from Runtime/Restart Runtime option in the menu.
You could fork the libraries on GitHub, push your changes to a new branch and then do.
!pip install git+https://github.com/your-username/keras.git#new-branch
Or even a specific commit
!pip install git+https://github.com/your-username/keras.git#632560d91286
You will need to restart your runtime for the changes to work.
More details here.
Per-runtime solution
import keras.optimizers
with open('optimizers.txt','r') as writer_file:
contents_to_write = writer_file.read()
with open(keras.optimizers.__file__,'w') as file_to_overwrite:
file_to_overwrite.write(contents_to_write)
>>Restart runtime (do not 'Reset all runtimes')
To clarify, (1) save edited module of interest as a .txt, (2) overwrite Colab module with the saved module via .__file__, (3) 'Reset all runtimes' restores Colab modules - use if module breaks
Considering its simplicity, it's as good as a permanent fix. For possibly better scalability, see fizzybear's solution.
I am pretty new to Haskell as well as stack.
import Data.Set
import Data.Stack
The statements above trigger compilation error: Could not find module 'Data.Set'. Perhaps you meant 'Data.Int'. I tried to google and found nothing similar.
Hence, my question is: do I need to specify external dependencies manually or just my stack build command somewhy fails to grab appropriate modules from somewhat cache or repository?
In case I have to specify my dependencies manually, should I prefer .cabal or .yaml? What's the correct way to deal with versioning?
[Do] I need to specify external dependencies manually [...]?
Yes.
Since you are using Stack, it is easy to specify the dependent packages you import in your code. Depend on your Stack version, the default configuration might be a little bit different:
If you created your project with the latest version of Stack, you will see package.yaml in the root of your project (hpack is used in this case to specify the configurations). You need to add package dependencies there, e.g., containers for Data.Set. Here's an example of a dependencies section in one of my projects:
dependencies:
- base >= 4.7 && < 5
- containers
- time
- network
- bytestring
If you are using an older version of stack and do not see package.yaml, you need to edit your-project-name.cabal to add the dependencies. Here's the complete document telling you how to do it: https://docs.haskellstack.org/en/stable/GUIDE/#adding-dependencies
Personally, I prefer the new system with hpack (package.yaml). Basically, it is simpler, and it saves you time declaring all modules you have (not about dependency). If you have package.yaml, do not edit .cabal, which is automatically generated by hpack.
I have developed a module (M.hs) which depends upon 3 other modules (A.hs, B.hs and C.hs). Now I want to use the module M across multiple other projects. So I have to install this module. But for learning purpose I don't want to use cabal, I want to do it manually. I want to install it in my home dir.
What is a proper course of action? Which files to be created, copied? where? How to use this module in other project?
Additional info:
I am using Debian 6
I am using GHC 6.12
You say you don’t want to use cabal, but would you use Cabal?
cabal is the name of the command line tool provided by cabal-install which can download packages from Hackage and resolve dependencies.
Cabal is the library that Haskell code uses to drive the compilation (e.g. pre-process files, build in the right order, build variants, generate documentation) and install into the right location.
I would not recommend not using Cabal, even for learning purposes, until you want to write a replacement for it. But if you really want to do it, here is the rough outline, with enough details to figure out for a good learning experience:
Build your files with -package-name yourpkgname-version.
Link the generated files to form a libyourpkgname-version.a file.
Create a package configuration file like /var/lib/ghc/package.conf.d/mtl-2.1.2.conf, and pay attention to name, `exposed-modules, import-dirs, library-dirs and hs-libraries
Register package by passing it to ghc-pkg register