Import Scripts From Folders Dynamically in For Loop - python-3.x

I have my main.py script in a folder, along with about 10 other folders with varying names. These names change from time to time so I can't just import by a specific folder name each time; so I thought I could create a For loop that would dynamically load all the folder names into a list first, then iterate over them to import the template.py script that is within each folder. And yes, they are all named template.py, but each folder has one that is unique to that folder.
My main.py script looks like this:
import os
import sys
# All items in the current directory that do not have a dot extension, and isn't the pycache folder,
# are considered folders to iterate through
pipeline_folder_names = [name for name in os.listdir("./") if not '.' in name and not 'pycache' in name]
for i in pipeline_folder_names:
print(i)
path = sys.path.insert(0, './' + i)
import template
It works on the first folder just fine, but then doesn't change into the next directory to import the next template script. I've tried adding both:
os.chdir('../')
and
sys.path.remove('./' + i)
to the end to "reset" the directory but neither of them work. Any ideas? Thanks!

When you import a module in python, it's loaded into the cache. The second time you import template, its not the new file that's imported, python just reloads the first one.
This is what worked for me.
The directory structure and content:
.
├── 1
│   ├── __pycache__
│   │   └── template.cpython-38.pyc
│   └── template.py
├── 2
│   ├── __pycache__
│   │   └── template.cpython-38.pyc
│   └── template.py
└── temp.py
$ cat 1/template.py
print("1")
$ cat 2/template.py
print("2")
Load the first one manualy, then use the reload function from importlib to load the new template.py file.
import os
import sys
import importlib
# All items in the current directory that do not have a dot extension, and isn't the pycache folder,
# are considered folders to iterate through
pipeline_folder_names = [name for name in os.listdir("./") if not '.' in name and not 'pycache' in name]
sys.path.insert(1, './' + pipeline_folder_names[0])
import template
sys.path.remove('./' + pipeline_folder_names[0])
for i in pipeline_folder_names[1:]:
path = sys.path.insert(0, './' + i)
importlib.reload(template)
sys.path.remove('./' + i)
Running this give the output:
$ python temp.py
1
2

considering the above folder structure.
You need to create each folder a module, which can done by creating an
empty
__init__.py file in each folder parallel to template.py
then below code in temp.py will solve your issue
import os
import sys
import importlib
pipeline_folder_names = [name for name in os.listdir("./") if not '.' in name and not 'pycache' in name]
def import_template(directory):
importlib.import_module(directory+'.template')
for i in pipeline_folder_names:
import_template(i)

Related

With AWS CDK Python, how to create a subdirectory, import a .py, and call a method in there?

I am attempting to get the simplest example of creating a S3 bucket with the AWS CDK Python with no luck.
I want to put the code to create the bucket in another file (which file exists in a subdirectory).
What I am doing works with every other Python project I have developed or started.
Process:
I created an empty directory: aws_cdk_python/. I then, inside that directory ran:
$cdk init --language python to layout the structure.
This created another subdirectory with the same name aws_cdk_python/, and created a single .py within that directory where I could begin adding code in the __init__(self) method (constructor)
I was able to add code there to create a S3 bucket.
Now I created a subdirectory, with an __init__.py and a file called: create_s3_bucket.py
I put the code to create a S3 bucket in this file, in a method called 'main'
file: create_s3_bucket.py
def main(self):
<code to create s3 bucket here>
When I run the code, it will create the App Stack with no errors, but the S3 bucket will not be created.
Here is my project layout:
aws_cdk_python/
setup.py
aws_cdk_python/
aws_cdk_python_stack.py
my_aws_s3/
create_s3_bucket.py
setup.py contains the following two lines:
package_dir={"": "aws_cdk_python"},
packages=setuptools.find_packages(where="aws_cdk_python"),
The second line here says to look in the aws_cdk_python/ directory, and search recursively in sub-folders for .py files
In aws_cdk_python_stack.py, I have this line:
from my_aws_s3.create_s3_bucket import CreateS3Bucket
then in __init__ in aws_cdk_python_stack.py, I instantiate the object:
my_aws_s3 = CreateS3Bucket()
and then I make a call like so:
my_aws_s3.main() <== code to create the S3 bucket is here
I have followed this pattern on numerous Python projects before using find_packages() in setup.py
I have also run:
$python -m pip install -r requirements.txt which should pick up the dependencies pointed to in setup.py
Questions:
- Does anyone that uses the AWS CDK Python done this? or have recommendations for code organization?
I do not want all the code for the entire stack to be in aws_cdk_python_stack.py __init__() method.
Any ideas on why there no error displayed in my IDE? All dependencies are resolved, and methods found, but when I run, nothing happens?
How can I see any error messages, no error messages appear with $cdk deploy, it just creates the stack, but not the S3 bucket, even though I have code to call and create a S3 bucket.
This is frustrating, it should work.
I have other sub-directories that I want to create under aws_cdk_python/aws_cdk_python/<dir> , put a __init__.py there (empty file) and import classes in the top level aws_cdk_python_stack.py
any help to get this working would be greatly appreciated.
cdk.json looks like this (laid down from cdk init --language python
{
"app": "python app.py",
"context": {
"#aws-cdk/aws-apigateway:usagePlanKeyOrderInsensitiveId": true,
"#aws-cdk/core:enableStackNameDuplicates": "true",
"aws-cdk:enableDiffNoFail": "true",
"#aws-cdk/core:stackRelativeExports": "true",
"#aws-cdk/aws-ecr-assets:dockerIgnoreSupport": true,
"#aws-cdk/aws-secretsmanager:parseOwnedSecretName": true,
"#aws-cdk/aws-kms:defaultKeyPolicies": true,
"#aws-cdk/aws-s3:grantWriteWithoutAcl": true,
"#aws-cdk/aws-ecs-patterns:removeDefaultDesiredCount": true,
"#aws-cdk/aws-rds:lowercaseDbIdentifier": true,
"#aws-cdk/aws-efs:defaultEncryptionAtRest": true,
"#aws-cdk/aws-lambda:recognizeVersionProps": true,
"#aws-cdk/aws-cloudfront:defaultSecurityPolicyTLSv1.2_2021": true
}
}
app.py looks like this
import os
from aws_cdk import core as cdk
from aws_cdk import core
from aws_cdk_python.aws_cdk_python_stack import AwsCdkPythonStack
app = core.App()
AwsCdkPythonStack(app, "AwsCdkPythonStack",
)
app.synth()
to date: Tue 2021-12-31, this has not been solved
Not entirely sure, but I guess it depends on what your cdk.json file looks like. It contains the command to run for cdk deploy. E.g.:
{
"app": "python main.py", <===== this guy over here assumes the whole app is instantiated by running main.py
"context": {
...
}
}
Since I don't see this entrypoint present in your project structure it might be related to that.
Usually after running cdk init you should at least be able to synthesize. usually in app.py you keep your main App() definition and stack and constructs go in subfolders. Stacks are often instantiated in app.py and the constructs are instantiated in the stack definition files.
I hope it helped you a bit further!
Edit:
Just an example of a working tree is shown below:
aws_cdk_python
├── README.md
├── app.py
├── cdk.json
├── aws_cdk_python
│   ├── __init__.py
│   ├── example_stack.py
│   └── s3_stacks <= this is your subfolder with s3 stacks
│   ├── __init__.py
│   └── s3_stack_definition.py <== file with an s3 stack in it
├── requirements.txt
├── setup.py
└── source.bat
aws_cdk_python/s3_stacks/s3_stack_definition.py:
from aws_cdk import core as cdk
from aws_cdk import aws_s3
class S3Stack(cdk.Stack):
def __init__(self, scope: cdk.Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
bucket = aws_s3.Bucket(self, "MyEncryptedBucket",
encryption=aws_s3.BucketEncryption.KMS
)
app.py:
from aws_cdk import core
from aws_cdk_python.s3_stacks.s3_stack_definition import S3Stack
app = core.App()
S3Stack(app, "ExampleStack",
)
app.synth()

Write dataframe to path outside current directory with a function?

I got a question that relates to (maybe is a duplicate of) this question here.
I try to write a pandas dataframe to an Excel file (non-existing before) in a given path. Since I have to do it quite a few times, I try to wrap it in a function. Here is what I do:
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
def excel_to_path(frame, path):
writer = pd.ExcelWriter(path , engine='xlsxwriter')
frame.to_excel(writer, sheet_name='Output')
writer.save()
excel_to_path(df, "../foo/bar/myfile.xlsx")
I get thrown the error [Errno 2] No such file or directory: '../foo/bar/myfile.xlsx'. How come and how can I fix it?
EDIT : It works as long the defined pathis inside the current working directory. But I'd like to specify any given pathinstead. Ideas?
I usually get bitten by forgetting to create the directories. Perhaps the path ../foo/bar/ doesn't exist yet? Pandas will create the file for you, but not the parent directories.
To elaborate, I'm guessing that your setup looks like this:
.
└── src
├── foo
│   └── bar
└── your_script.py
with src being your working directory, so that foo/bar exists relative to you, but ../foo/bar does not - yet!
So you should add the foo/bar directories one level up:
.
├── foo_should_go_here
│   └── bar_should_go_here
└── src
├── foo
│   └── bar
└── your_script.py

ZipFile creating zip with all the folders in zip

From the Python docs, I picked follwing snippet to zip a single file (For a flask project).
I have to create a zip file in temploc here:
/home/workspace/project/temploc/zipfile.zip
And here is my file to be zipped:
/home/workspace/project/temploc/file_to_be_zipped.csv
from zipfile import ZipFile
def zip_file(self, output, file_to_zip):
try:
with ZipFile(output, 'w') as myzip:
myzip.write(file_to_zip)
except:
return None
return output
This code creating a zip file in temploc but with full directory structure of zip file path.
def prepare_zip(self):
cache_dir = app.config["CACHE_DIR"] #-- /home/workspace/project/temploc
zip_file_path = os.path.join(cache_dir, "zipfile.zip")
input_file = '/home/workspace/project/temploc/file_to_be_zipped.csv'
self.zip_file(zip_file_path, input_file)
But above code is creating a zip file with given path directory structure:
zipfile.zip
├──home
│   ├── workspace
│   │   └── project
│   │   └──temploc
│   │   └── file_to_be_zipped.csv
BUt I want only this structure:
zipfile.zip
└── file_to_be_zipped.csv
I'm not getting what I'm missing.
You shuld use second argument of ZipFile.write to set proper name of file in archive:
import os.path
...
myzip.write(file_to_zip, os.path.basename(file_to_zip))

Jinja can't find template path

I can't get Jinja2 to read my template file.
jinja2.exceptions.TemplateNotFound: template.html
The simplest way to configure Jinja2 to load templates for your
application looks roughly like this:
from jinja2 import Environment, PackageLoader env =
Environment(loader=PackageLoader('yourapplication', 'templates')) This
will create a template environment with the default settings and a
loader that looks up the templates in the templates folder inside the
yourapplication python package. Different loaders are available and
you can also write your own if you want to load templates from a
database or other resources.
To load a template from this environment you just have to call the
get_template() method which then returns the loaded Template:
template = env.get_template('mytemplate.html')
env = Environment(loader=FileSystemLoader('frontdesk', 'templates'))
template = env.get_template('template.html')
My tree ( I have activated the venv #frontdesk )
.
├── classes.py
├── labels.txt
├── payments.py
├── templates
├── test.py
└── venv
You are using the FileSystemLoader class which has the following init arguments:
class FileSystemLoader(BaseLoader):
def __init__(self, searchpath, encoding='utf-8', followlinks=False):
You are initializing it with 2 arguments: frontdesk and templates, which basically does not make much sense, since the templates string would be passed as an encoding argument value. If you want to continue using FileSystemLoader as a template loader, use it this way:
from jinja2 import Environment, FileSystemLoader
env = Environment(loader=FileSystemLoader('frontdesk/templates'))
template = env.get_template('index.html')
Or, if you meant to use the PackageLoader class:
from jinja2 import Environment, PackageLoader
env = Environment(loader=PackageLoader('frontdesk', 'templates'))
template = env.get_template('index.html')
In this case you need to make sure frontdesk is a package - in other words, make sure you have __init__.py file inside the frontdesk directory.

Read all files in a nested folder in Spark

If we have a folder folder having all .txt files, we can read them all using sc.textFile("folder/*.txt"). But what if I have a folder folder containing even more folders named datewise, like, 03, 04, ..., which further contain some .log files. How do I read these in Spark?
In my case, the structure is even more nested & complex, so a general answer is preferred.
If directory structure is regular, lets say something like this:
folder
├── a
│   ├── a
│   │   └── aa.txt
│   └── b
│   └── ab.txt
└── b
├── a
│   └── ba.txt
└── b
└── bb.txt
you can use * wildcard for each level of nesting as shown below:
>>> sc.wholeTextFiles("/folder/*/*/*.txt").map(lambda x: x[0]).collect()
[u'file:/folder/a/a/aa.txt',
u'file:/folder/a/b/ab.txt',
u'file:/folder/b/a/ba.txt',
u'file:/folder/b/b/bb.txt']
Spark 3.0 provides an option recursiveFileLookup to load files from recursive subfolders.
val df= sparkSession.read
.option("recursiveFileLookup","true")
.option("header","true")
.csv("src/main/resources/nested")
This recursively loads the files from src/main/resources/nested and it's subfolders.
if you want use only files which start with name "a" ,you can use
sc.wholeTextFiles("/folder/a*/*/*.txt") or sc.wholeTextFiles("/folder/a*/a*/*.txt")
as well. We can use * as wildcard.
sc.wholeTextFiles("/directory/201910*/part-*.lzo") get all match files name, not files content.
if you want to load the contents of all matched files in a directory, you should use
sc.textFile("/directory/201910*/part-*.lzo")
and setting reading directory recursive!
sc._jsc.hadoopConfiguration().set("mapreduce.input.fileinputformat.input.dir.recursive", "true")
TIPS: scala differ with python, below set use to scala!
sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive", "true")

Resources