python3, directory is not correct when import a module in sub-folder - python-3.x

I have a main folder called 'test', the inner structure is:
# folders and files in the main folder 'test'
Desktop\test\use_try.py
Desktop\test\cond\__init__.py # empty file.
Desktop\test\cond\tryme.py
Desktop\test\db\
Now in the file tryme.py. I want to generate a file in the folder of 'db'
# content in the file of tryme.py
import os
def main():
cwd = os.getcwd() # the directory of the folder 'Desktop\test\cond'
folder_test = cwd[:-4] # -4 since 'cond' has 4 letters
folder_db = folder_test + 'db/' # the directory of folder 'db'
with open(folder_db + 'db01.txt', 'w') as wfile:
wfile.writelines(['This is a test.'])
if __name__ == '__main__':
main()
If I directly run this file, no problem, the file 'db01.txt' is in the folder of 'db'.
But if I run the file of use_try.py, it will not work.
# content in the file of use_try.py
from cond import tryme
tryme.main()
The error I got refers to the tryme.py file. In the command of 'with open ...'
FileNotFoundError: [Error 2] No such file or directory: 'Desktop\db\db01.txt'
It seems like the code
'os.getcwd()'
only refers to the file that calls the tryme.py file, not the tryme.py file itself.
Do you know how to fix it, so that I can use the file use_try.py to generate the 'db01.txt' in the 'db' folder? I am using Python3
Thanks

Seems like what you need is not the working directory, but the directory of the tryme.py file.
This can be resolved using the __file__ magic:
curdir = os.path.dirname(__file__)

Use absolute filenames from an environment variable, or expect the db/ directory to be a subdirectory of the current working directory.
This behavior is as expected. The current working directory is where you invoke the code from, not where the code is stored.
folder_test = cwd # assume working directory will have the db/ subdir
or
folder_test = os.getEnv('TEST_DIR') # use ${TEST_DIR}/db/

Related

How to access the Hydra config object at runtime

I need to change the output/working directory of the hydra config framework in such a way that it lies outside of my project directory. According to my understanding and the doc, config.yaml would need to look like this:
exp_nr: 0.0.0.0
condition: something
hydra:
run:
dir: /absolute/path/to/folder/${exp_nr}/${condition}/
In my code, I then tried to access and set the path like this:
import os
import hydra
from omegaconf import DictConfig
#hydra.main(config_path="../../config", config_name="config", version_base="1.3")
def main(cfg: DictConfig):
print(cfg)
cwd = os.getcwd()
print(f"The current working directory is {cwd}")
owd = hydra.utils.get_original_cwd()
print(f"The Hydra original working directory is {owd}")
work_dir = cfg.hydra.run.dir
print(f"The work directory should be {work_dir}")
But I get the following output and error:
{'exp_nr': '0.0.0.0', 'condition': 'something'}
The current working directory is /project/path/subdir/subsubdir
The Hydra original working directory is /project/path/subdir/subsubdir
Error executing job with overrides: ['exp_nr=1.0.0.0', 'condition=somethingelse']
Traceback (most recent call last):
File "/project/path/subdir/subsubdir/model.py", line 13, in main
work_dir = cfg.hydra.run.dir
omegaconf.errors.ConfigAttributeError: Key 'hydra' is not in struct
full_key: hydra
object_type=dict
I see that hydra.run.dir doesn't appear in the cfg dict printed first but how can I access the path through the config if os.getcwd() isn't set already? Or what did I do wrong?
The path is correct as I already saved files to the folder before integrating hydra and if the process isn't killed due to the error the folder also gets created but hydra doesn't save any files to it, not even the log file with the parameters it should save by default. I also tried to set the path relative to the standard output path or having an extra config parameter work_dir: ${hydra.run.dir} (returns an Interpolation error).
You can access the Hydra config via the HydraConfig singleton documented here.
from hydra.core.hydra_config import HydraConfig
#hydra.main()
def my_app(cfg: DictConfig) -> None:
print(HydraConfig.get().job.name)

Python class cannot open file when object is created from directory below

I have a app.py file which is using a custom class which I created one level below, ie.
from myfolder import util_file
def get_data():
dataContainer = util_file.MyDataContainer()
# do more stuff
One folder down my util_file I create MyDataContainer as a class that uses data from two local files to instantiate itself. ie.
class MyDataContainer:
def __init__(self):
with open('file1.txt', 'r'):
#get needed init data part 1
#repeat operation with data in file2.txt
The object is created fine when I run from the file itself, ie.
if __name__ == '__main__':
testcont = MyDataContainer()
The issue is that when I run the code in app.py I get:
FileNotFoundError: [Errno 2] No such file or directory: 'file1.txt'
thrown at the line
with open('file1.txt', 'r'):
I am guessing that python is unable to see the file1.txt and file2.txt since it is in another directory at runtime but I checked the my path and the subfolder is in sys.path at runtime, so I believe these should be visible. I really do not understand what is going on here. Is there a different reason python is unable to see these files?

Python: Move files from local PC to server

I need to move files from my PC to a network location, however if I execute the script I get an error. If have tested this to execute on my PC to a different local folder and it works perfectly.
Here is my code which I got, and modified slightly, from https://thispointer.com/python-how-to-move-files-and-directories/ (giving credit to the author):
import shutil, os, glob, time
def moveAllFilesinDir(srcDir, dstDir):
# Check if both the are directories
if os.path.isdir(srcDir) and os.path.isdir(dstDir) :
# Iterate over all the files in source directory
for filePath in glob.glob(srcDir + '\*'):
# Move each file to destination Directory
if(os.path.getctime(filePath) != os.path.getmtime(filePath)):
shutil.move(filePath, dstDir);
else:
print("srcDir & dstDir should be Directories")
sourceDir = r"C:\Folder A"
destDir = r"\\Server\Folder B"
moveAllFilesinDir(sourceDir,destDir)
Any help will be highly appreciated.
Update
I forgot to mention that I am making use of Remote Desktop to access the server.
Errors I receive:
FileNotFoundError: [WinError 67] The network name cannot be found.
FileNotFoundError: [Errno 2] No such file or directory

Referencing a YAML config file, from Python, when a softlink is defined

I have the following code;
#!/usr/bin/env python3
import yaml
with open('config.yml', 'r') as config_file:
config = yaml.load(config_file)
The file is called __init__.py which is in the directory ~/bin/myprogram/myprogram/ and in the same directory, I have a file called config.yml
My symlink is as follows;
user$ ls -la /usr/local/bin/
lrwxr-xr-x 1 user admin 55 27 Nov 13:25 myprogram -> /Users/user/bin/myprogram/myprogram/__init__.py
Every time I run myprogram, I get the error FileNotFoundError: [Errno 2] No such file or directory: 'config.yml'. I believe this is because the config.yml is not in /usr/local/bin/. What is the best way to work around this issue?
You can use __file__ to access the location of the __init__.py file when executing code in that file. It returns the full path, but care has to be taken as it may be the .pyc (or .pyo) version. Since you are using Python3 I would use the pathlib module:
import yaml
from pathlib import Path
my_path = Path(__file__).resolve() # resolve to get rid of any symlinks
config_path = my_path.parent / 'config.yaml'
with config_path.open() as config_file:
config = yaml.safe_load(config_file)
Please note:
If you have to use PyYAML, use safe_load(), even PyYAML's own documentation indicates .load() can be unsafe. It almost never necessary to use that. And in the unlikely event that safe_load() cannot load your config, e.g. if it has !!python/... tags, you should explicitly add register the classes that you actually need to the SafeLoader).
Since September 2006 the recommended extension for YAML files has been .yaml

How to add a module folder /tar.gz to nodes in Pyspark

I am running pyspark in Ipython Notebook after doing following configuration
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook--NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"
export PYSPARK_PYTHON=/usr/bin/python
I am having a custom udf function, which makes use of a module called mzgeohash. But, I am getting module not found error, I guess this module might be missing in workers / nodes .I tried to add sc.addpyfile and all. But, what will be the effective way to add a cloned folder or tar.gz python module in this case , from Ipython .
Here is how I do it, basically the idea is to create a zip of all the files in your module and pass it to sc.addPyFile() :
import dictconfig
import zipfile
def ziplib():
libpath = os.path.dirname(__file__) # this should point to your packages directory
zippath = '/tmp/mylib-' + rand_str(6) + '.zip' # some random filename in writable directory
zf = zipfile.PyZipFile(zippath, mode='w')
try:
zf.debug = 3 # making it verbose, good for debugging
zf.writepy(libpath)
return zippath # return path to generated zip archive
finally:
zf.close()
...
zip_path = ziplib() # generate zip archive containing your lib
sc.addPyFile(zip_path) # add the entire archive to SparkContext
...
os.remove(zip_path) # don't forget to remove temporary file, preferably in "finally" clause

Resources