How can I generate a PDF with custom fonts using AWS Lambda? - node.js

I have an AWS Lambda function that generates PDFs using the html-pdf library with custom fonts.
At first, I imported my fonts externally from Google Fonts, but then the PDF's size has enlarged by ten times.
So I tried to import my fonts locally src('file:///var/task/fonts/...ttf/woff2') but still no luck.
Lastly, I trie to create fonts folder in the main project and then I added all of my fonts, plus the file fonts.config:
<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
<dir>/var/task/fonts/</dir>
<cachedir>/tmp/fonts-cache/</cachedir>
<config></config>
</fontconfig>
and set the following env:
FONTCONFIG_PATH = /var/task/fonts
but still no luck (I haven't installed fontconfig since I'm not sure how and if I need to).
My Runtime env is Node.js 8.1.0.

You can upload your fonts into an S3 bucket and then download them to the lambda's /tmp directory, during its execution. In case your lib creates .pkl files, you should first change your root directory to /tmp (lambda is not allowed to write in the default root directory).
The following Python code downloads your files from a /fonts directory in an S3 bucket to /tmp/fonts "local" directory.
import os
import boto3
os.chdir('/tmp')
os.mkdir(os.path.join('/tmp/', 'fonts'))
s3 = boto3.resource('s3')
s3_client = boto3.client('s3')
my_bucket = s3.Bucket("bucket_name")
for file in my_bucket.objects.filter(Prefix="fonts/"):
filename = file.key
short_filename = filename.replace('fonts/','')
if(len(short_filename) > 0):
s3_client.download_file(
bucket,
filename,
"/tmp/fonts/" + short_filename,
)

Related

Update file in folder inside the s3bucket python

I have a folder inside the s3 bucket i need to update the file inside the existing folder
using python can any one assist
I found an answer to my own question. The code below works perfectly
from my local folder. I opened the folder and and use 'rb' (read binary) to store file.
s3 = boto3.client(
's3',
region_name='ap-south-1',
aws_access_key_id = S3_access_key,
aws_secret_access_key=S3_secret_key
)
path_data=os.path.join(MEDIA_ROOT, "screenShots",folder_name)
dir=os.listdir(path_data)
for file in dir:
with open(path_data+"/"+file, 'rb') as data:
s3.upload_fileobj(data, s3_Bucket_name, folder_name+"/"+file)
shutil.rmtree(path_data)

Python selenium - How to store file on s3 from firefox download

I am using python 3.8 + selenium package on aws lambda and firefox driver
So how can I set the download location for firefox profile in selenium?
profile = webdriver.FirefoxProfile()
profile.set_preference("browser.downloads.folderList", 2)
profile.set_preference("browser.downloads.manager.showWhenStarting", False)
profile.set_preference("browser.downloads.dir","HOW TO SET S3 PATH HERE")
If not then what is the best way to implement it
Selenium doesn't know anything about AWS or S3. You can download files to the local filesystem, then upload to S3 with boto3. For example:
profile.set_preference("browser.downloads.dir", DOWNLOAD_DIRECTORY)
# when Selenium run is complete, create a gzipped tar file of downloads
tarball = "{0}.tar.gz".format(DOWNLOAD_DIRECTORY)
with tarfile.open(tarball, "w:gz") as f:
f.add(DOWNLOAD_DIRECTORY)
client = boto3.client("s3")
try:
client.upload_file(tarball, S3_BUCKET, s3_key_name)
except ClientError as e:
log.error(e)

Boto3 - Multi-file upload to specific S3 bucket "path" using CLI arguments

new coder here. For work, I receive a request to put certain files in an already established s3 bucket with a requested "path."
For example: "Create a path of (bucket name)/1/2/3/ with folder3 containing (requested files)"
I'm looking to create a Python3 script to upload multiple files from my local machine to a specified bucket and "path" using CLI arguments specifying the file(s), bucket name, and "path"/key - I understand s3 doesn't technically have a folder structure, and that you have to put your "folders" in as part of the key, which is why I put "path" in quotes.
I have a working script doing what I want it to do, but the bucket/key is hard coded at the moment and I'm looking to get away from that with the use and understanding of CLI arguments. This is what I have so far -- it just doesn't upload the file, though it builds the path in s3 successfully :/
EDIT: Below is the working version of what I was looking for!
import argparse
#import os
import boto3
def upload_to_s3(file_name, bucket, path):
s3 = boto3.client('s3')
s3.upload_file(file_name, bucket, path)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--file_name')
parser.add_argument('--bucket')
parser.add_argument('--path')
args = parser.parse_args()
upload_to_s3(args.file_name, args.bucket, args.path)
my input is:
>>> python3 s3_upload_args_experiment.py --file_name test.txt --bucket mybucket2112 --path 1/2/3/test.txt
Everything executes properly!
Thank you much!

Can't load PDF with Wand/ImageMagick in Google Cloud Function

Trying to load a PDF from the local file system and getting a "not authorized" error.
"File "/env/local/lib/python3.7/site-packages/wand/image.py", line 4896, in read self.raise_exception() File "/env/local/lib/python3.7/site-packages/wand/resource.py", line 222, in raise_exception raise e wand.exceptions.PolicyError: not authorized `/tmp/tmp_iq12nws' # error/constitute.c/ReadImage/412
The PDF file is successfully saved to the local 'server' from GCS but won't be loaded by Wand. Loading images into OpenCV isn't an issue, just happening when trying to load PDFs using Wand/ImageMagick
Code to load the PDF from GCS to local file system into Wand/ImageMagick is below
_, temp_local_filename = tempfile.mkstemp()
gcs_blob = STORAGE_CLIENT.bucket('XXXX').get_blob(results["storedLocation"])
gcs_blob.download_to_filename(temp_local_filename)
# load the pdf into a set of images using imagemagick
with(Image(filename=temp_local_filename, resolution=200)) as source:
#run through pages and save images etc.
ImageMagick should be authorised to access files on the local filesystem so it should load the file without issue instead of this 'Not Authorised' error.
PDF reading by ImageMagick has been disabled because of a security vulnerability Ghostscript had. The issue is by design and a security mitigation from the ImageMagick team will exist until. ImageMagick Enables Ghostscript processing of PDFs again and Google Cloud Functions update to that new version of ImageMagick with PDF processing enabled again.
There's no fix for the ImageMagick/Wand issue in GCF that I could find but as a workaround for converting PDFs to images in Google Cloud Functions, you can use this [ghostscript wrapper][2] to directly request the PDF conversion to an image via Ghostscript and bypass ImageMagick/Wand. You can then load the PNGs into ImageMagick or OpenCV without issue.
requirements.txt
google-cloud-storage
ghostscript==0.6
main.py
# create a temp filename and save a local copy of pdf from GCS
_, temp_local_filename = tempfile.mkstemp()
gcs_blob = STORAGE_CLIENT.bucket('XXXX').get_blob(results["storedLocation"])
gcs_blob.download_to_filename(temp_local_filename)
# create a temp folder based on temp_local_filename
temp_local_dir = tempfile.mkdtemp()
# use ghostscript to export the pdf into pages as pngs in the temp dir
args = [
"pdf2png", # actual value doesn't matter
"-dSAFER",
"-sDEVICE=pngalpha",
"-o", temp_local_dir+"page-%03d.png",
"-r300", temp_local_filename
]
# the above arguments have to be bytes, encode them
encoding = locale.getpreferredencoding()
args = [a.encode(encoding) for a in args]
#run the request through ghostscript
ghostscript.Ghostscript(*args)
# read the files in the tmp dir and process the pngs individually
for png_file_loc in glob.glob(temp_local_dir+"*.png"):
# loop through the saved PNGs, load into OpenCV and do what you want
cv_image = cv2.imread(png_file_loc, cv2.IMREAD_UNCHANGED)
Hope this helps someone facing the same issue.

Zipfile file in cloud(amazon s3) without writing it first to local file(no write privileges)

I need to zip some files in amazon s3 without needing to write them to file locally first. Ideally my code worked in development but i don't have many write privileges in production.
folder = output_dir
files = fs.glob(folder)
f = BytesIO()
zip = zipfile.ZipFile(f, 'a', zipfile.ZIP_DEFLATED)
for file in files:
filename = os.path.basename(file)
image = fs.get(file, filename)
zip.write(filename)
zip.close()
the proplem is at this line in production
image = fs.get(file, filename)
Because i don't have write privileges.
My last resort is to write to /tmp/ directory which i have privileges to.
Is there a way to zip files from a url path or directly in the cloud?
I ended up using python tempfile which ended up being a perfect solution.
Using NamedTemporaryFile gave me the guarantee to create named and system visible temporary files that could be deleted automatically. No manual work.

Resources