I am trying to use the disk_usage function from the shutil package.
Python version is 3.10.4
the docs state:
shutil.disk_usage(path)
Return disk usage statistics about the given path as a named tuple with the attributes total, used and free, which are the amount of
total, used and free space, in bytes. path may be a file or a
directory.
New in version 3.3.
Changed in version 3.8: On Windows, path can now be a file or directory.
These 3 code snipped should output the same but don't.
Using Jupyter Notebook inside Visual Studio Code on Windows11
How do I get the size of a directory with this function?
Example1
from shutil import disk_usage
import os
print(disk_usage("C:/Users"))
print(disk_usage("C:"))
>usage(total=999424712704, used=624119431168, free=375305281536)
>usage(total=999424712704, used=624119435264, free=375305277440)
Example2
from shutil import disk_usage
import os
print(disk_usage("C:"))
print(disk_usage(r"C:\Users"))
> usage(total=999424712704, used=624119824384, free=375304888320)
> usage(total=999424712704, used=624119824384, free=375304888320)
Example3
reversing the order makes a difference
from shutil import disk_usage
import os
print(disk_usage(r"C:\Users"))
print(disk_usage("C:"))
> usage(total=999424712704, used=624120086528, free=375304626176)
> usage(total=999424712704, used=624120090624, free=375304622080)
The Issue is that the jupyter notebook is written to the disk, changing the usage. Also probably some other processes are storing (temp files) on C between the executions.
Using a path to a flashdrive or secondary disc will probably be more stable.
disk_usage only gives stats on the disk, not folders.
Related
I want to include a large python package: Yolov7-https://github.com/WongKinYiu/yolov7.git into my already existing ROS package. The large python package consists of many files which depend on each other. I want this to be included and available for use in my main program scripts/main.py, which runs as a ROS node.
File structure: https://imgur.com/a/THP5WvP
PROBLEM: Files in yolov7 importing other files in yolov7 need to have the pathname package.yolov7.file_name, while the whole package is actually written on the form: from file_name import ...
How can I make sure files in yolov7 actually can import other files in yolov7 without having to add the extra path to every import?
Example: instead of having to write: from package.yolov7.utils.datasets import letterbox
I want to be able to have: from utils.datasets import letterbox
I have tried editing CMakeLists.txt, setup.py, and package.xml, but for now they are unchanged in regards to yolov7.
I'm using ROS Noetic, running Ubuntu 5.4.123.
I want to import a standard library (of a given version) in a Databricks Notebook Job. I do not want to install the library everytime a job cluster is created for this job. I want to install the library in a DBFS location and import the library directly from that location (by changing sys.path or something similar).
This is working in local:
I installed a library in a given path using:
pip install --target=customLocation library=major.minor
Append custom Location to sys.path variable:
sys.path.insert(0, 'customLocation')
When I import the library and check the location, I get the expected response
import library
print(library.__file__)
#Output - customLocation\library\__init__.py
However, the same exact sequence does not work in Databricks:
I installed the library in DBFS location:
%pip install --target='dbfs/customFolder/' numpy==1.19.4
Append sys.path variable:
sys.path.insert(0, 'dbfs/customFolder/')
Tried to find numpy version and file location:
import numpy
print(numpy.__version__)
print(numpy.__file__)
#Output - 1.21.4 (Databricks Runtime 10.4 Default Numpy)
/dbfs/customFolder/numpy/__init__.py
The customFolder has version 1.19.4 and the imported numpy shows that location, but version number does not match?
How exactly is the imports working in databricks to create this behaviour?
I also tried importing using the importlib function and the result remains the same. Link - https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly
import importlib.util
import sys
spec = importlib.util.spec_from_file_location('numpy', '/dbfs/customFolder/')
module = importlib.util.module_from_spec(spec)
sys.modules['numpy'] = module
spec.loader.exec_module(module)
import numpy
print(numpy.__version__)
print(numpy.__file__)
Related:
How to install Python Libraries in a DBFS Location and access them through Job Clusters (without installing it for every job)?
I am trying to load my saved model from s3 using joblib
import pandas as pd
import numpy as np
import json
import subprocess
import sqlalchemy
from sklearn.externals import joblib
ENV = 'dev'
model_d2v = load_d2v('model_d2v_version_002', ENV)
def load_d2v(fname, env):
model_name = fname
if env == 'dev':
try:
model=joblib.load(model_name)
except:
s3_base_path='s3://sd-flikku/datalake/doc2vec_model'
path = s3_base_path+'/'+model_name
command = "aws s3 cp {} {}".format(path,model_name).split()
print('loading...'+model_name)
subprocess.call(command)
model=joblib.load(model_name)
else:
s3_base_path='s3://sd-flikku/datalake/doc2vec_model'
path = s3_base_path+'/'+model_name
command = "aws s3 cp {} {}".format(path,model_name).split()
print('loading...'+model_name)
subprocess.call(command)
model=joblib.load(model_name)
return model
But I get this error:
from sklearn.externals import joblib
ImportError: cannot import name 'joblib' from 'sklearn.externals' (C:\Users\prane\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\externals\__init__.py)
Then I tried installing joblib directly by doing
import joblib
but it gave me this error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 8, in load_d2v_from_s3
File "/home/ec2-user/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 585, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/ec2-user/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
obj = unpickler.load()
File "/usr/lib64/python3.7/pickle.py", line 1088, in load
dispatch[key[0]](self)
File "/usr/lib64/python3.7/pickle.py", line 1376, in load_global
klass = self.find_class(module, name)
File "/usr/lib64/python3.7/pickle.py", line 1426, in find_class
__import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.externals.joblib'
Can you tell me how to solve this?
You should directly use
import joblib
instead of
from sklearn.externals import joblib
It looks like your existing pickle save file (model_d2v_version_002) encodes a reference module in a non-standard location – a joblib that's in sklearn.externals.joblib rather than at top-level.
The current scikit-learn documentation only talks about a top-level joblib – eg in 3.4.1 Persistence example – but I do see a reference in someone else's old issue to a DeprecationWarning in scikit-learn version 0.21 about an older scikit.external.joblib variant going away:
Python37\lib\site-packages\sklearn\externals\joblib_init_.py:15:
DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and
will be removed in 0.23. Please import this functionality directly
from joblib, which can be installed with: pip install joblib. If this
warning is raised when loading pickled models, you may need to
re-serialize those models with scikit-learn 0.21+.
'Deprecation' means marking something as inadvisable to rely-upon, as it is likely to be discontinued in a future release (often, but not always, with a recommended newer way to do the same thing).
I suspect your model_d2v_version_002 file was saved from an older version of scikit-learn, and you're now using scikit-learn (aka sklearn) version 0.23+ which has totally removed the sklearn.external.joblib variation. Thus your file can't be directly or easily loaded to your current environment.
But, per the DeprecationWarning, you can probably temporarily use an older scikit-learn version to load the file the old way once, then re-save it with the now-preferred way. Given the warning info, this would probably require scikit-learn version 0.21.x or 0.22.x, but if you know exactly which version your model_d2v_version_002 file was saved from, I'd try to use that. The steps would roughly be:
create a temporary working environment (or roll back your current working environment) with the older sklearn
do imports something like:
import sklearn.external.joblib as extjoblib
import joblib
extjoblib.load() your old file as you'd planned, but then immediately re-joblib.dump() the file using the top-level joblib. (You likely want to use a distinct name, to keep the older file around, just in case.)
move/update to your real, modern environment, and only import joblib (top level) to use joblib.load() - no longer having any references to `sklearn.external.joblib' in either your code, or your stored pickle files.
You can import joblib directly by installing it as a dependency and using import joblib,
Documentation.
Maybe your code is outdated. For anyone who aims to use fetch_mldata in digit handwritten project, you should fetch_openml instead. (link)
In old version of sklearn:
from sklearn.externals import joblib
mnist = fetch_mldata('MNIST original')
In sklearn 0.23 (stable release):
import sklearn.externals
import joblib
dataset = datasets.fetch_openml("mnist_784")
features = np.array(dataset.data, 'int16')
labels = np.array(dataset.target, 'int')
For more info about deprecating fetch_mldata see scikit-learn doc
none of the answers below works for me, with a little changes this modification was ok for me
import sklearn.externals as extjoblib
import joblib
for this error, I had to directly use the following and it worked like a charm:
import joblib
Simple
In case the execution / call to joblib is within another .py program instead of your own (in such case even you have installed joblib, it still causes error from within the calling python programme unless you change the code, i thought would be messy), I tried to create a hardlink:
(windows version)
Python> import joblib
then inside your sklearn path >......\Lib\site-packages\sklearn\externals
mklink /J ./joblib .....\Lib\site-packages\joblib
(you can work out the above using a ! or %, !mklink....... or %mklink...... inside your Python juptyter notebook , or use python OS command...)
This effectively create a virtual folder of joblib within the "externals" folder
Remarks:
Of course to be more version resilient, your code has to check for the version of sklearn is >= 0.23 again before hand.
This would be alternative to changing sklearn vesrion.
When getting error:
from sklearn.externals import joblib it deprecated older version.
For new version follow:
conda install -c anaconda scikit-learn (install using "Anaconda Promt")
import joblib (Jupyter Notebook)
I had the same problem
What I did not realize was that joblib was already installed!
so what you have to do is replace
from sklearn.externals import joblib
with
import joblib
and that is it
After a long investigation, given my computer setup, I've found that was because an SSL certificate was required to download the dataset.
I am new to python and i m having a really bad time to overcome a problem with the importing system.
Lets say i have the file system presented below:
/src
/src/main.py
/src/submodules/
/src/submodules/submodule.py
/src/submodules/subsubmodules
/src/submodules/subsubmodules/subsubmodule.py
All the folders (src, submodules, subsubmodules) have and empty __init__.py file.
In submodule.py i have:
from subsubmodules import subsubmodule
In main.py i have:
from submodules import submodule
When i run submodule.py python accepts the import. But when i run main.py python raises error for the import of subsubmodule.py because /src/submodules/subsubmodules/ folder is not in the path.
Only solution is to change the import of submodule.py to
from submodules.subsubmodules import subsubmodule
This seems to me as an awful solution because after that i cannot run submodule.py and i m sure that something else is the key to that.
An other solution is to add the following code to the __init__.py file:
import os
import sys
import inspect
cmd_subfolder = os.path.split(inspect.getfile(inspect.currentframe()))[0]
if cmd_subfolder not in sys.path:
sys.path.insert(0, cmd_subfolder)
Is there any way to do this using just the importing system of python and not other methods that do it manually using, for example sys.path or other modules like os, inspect etc..?
How can i import modules without caring about the modules they import?
You can run subsubmodule.py as
python3 -m submodule.subsubmodules.subsubmodule
If you want a shorter way to invoke it, you're free to add a shell or Python script for that on the top level of your package.
This is how imports work in Python 3; there are reasons for that.
You can avoid this issue by using sys.path in your program.
sys.path.insert(0, './lib')
import subsubmodule
For this code, you can put all your imports to a lib folder.
You can read the official documentation on Python packages where this is explained in depth.
I am using python 3.4 on windows 7. In order to open a doc file I am using this code:
import sys
import win32com.client as win32
word = win32.Dispatch("Word.Application")
word.Visible = 0
word.Documents.Open("MyDocument")
doc = word.ActiveDocument
I'M not sure why is this error popping up every time:
ImportError: no module named win32api
Although I have installed pywin32 from http://www.lfd.uci.edu/~gohlke/pythonlibs/#pywin32
and I have also checked the path from where I am importing. I have tried reinstalling pywin32 as well but that doesn't remove the error.
Try to install pywin32 from here :
http://sourceforge.net/projects/pywin32/files/pywin32/
depends on you operation system and the python version that you are using. Normally 32bit version should works on both 32 and 64 bit OS.
EDIT: moved to https://github.com/mhammond/pywin32/releases
This is a bug in the library itself, probably they used a different python implementation for creating this.
What they are trying to import is the site-packages\win32\win32api.pyd file, but the win32 folder is not in the path that python searches in, but site-packages is.
Try to replace the import win32api (inside win32com\__init__.py) to from win32 import win32api
I encountered the same error yestoday with Python 3.6.1 on Windows 7, and resolved it by "pip install pypiwin32".
Had the same error trying to import win32com.client (using Python 2.7, 64-bit). I agree with TulkinRB, there seem to be path issues, but the fix suggested did not work for me, since I also could not import win32.
Perhaps my fix will also work in Python 3.4.
Eventually, installing the .exe from SourceForge as an administrator (as suggested in Rina Rivera's answer here) allowed me to import win32com.client from IDLE, but not when I executed the script I was originally trying to run.
In the end, I discovered 3 differences in the sys.path that had been extended when I installed as admin and opened IDLE, but were not applied when executing a script. By extending the sys.path in my script, I was able to get rid of the import errors when I executed it:
import sys
sys.path.extend(('C:\\Python27\\lib\\site-packages\\win32', 'C:\\Python27\\lib\\site-packages\\win32\\lib', 'C:\\Python27\\lib\\site-packages\\Pythonwin'))
Finally, if you want more than a temporary fix, the sys.path could be permanently extended by establishing IDLESTARTUP or PYTHONSTARTUP variables (as described here and here).
You can create the __init.py file inside the win32 folder and then go inside the win32com folder and change its __init__.py file, where it is import win32api, change to from win32 import win32api
I ended up debugging and copying and pasting the necessary files into the appropriate folders. It's a work-around until the bug is fixed, but it works.
from https://github.com/mhammond/pywin32/issues/1151#issuecomment-360669440
append the 'pypiwin32_system32' path to your system PATH,
in a script this can be done like:
import os
sitedir='C:/where_ever/'
os.environ["PATH"]+=(';'+os.path.join(sitedir,"pypiwin32_system32"))
...
from powershell
$env:PATH="$PATH;C:\where_ever\pywin32_system32";
python.exe ...
for help on site dir, see What is python's site-packages directory?