Python selenium - How to store file on s3 from firefox download - python-3.x

I am using python 3.8 + selenium package on aws lambda and firefox driver
So how can I set the download location for firefox profile in selenium?
profile = webdriver.FirefoxProfile()
profile.set_preference("browser.downloads.folderList", 2)
profile.set_preference("browser.downloads.manager.showWhenStarting", False)
profile.set_preference("browser.downloads.dir","HOW TO SET S3 PATH HERE")
If not then what is the best way to implement it

Selenium doesn't know anything about AWS or S3. You can download files to the local filesystem, then upload to S3 with boto3. For example:
profile.set_preference("browser.downloads.dir", DOWNLOAD_DIRECTORY)
# when Selenium run is complete, create a gzipped tar file of downloads
tarball = "{0}.tar.gz".format(DOWNLOAD_DIRECTORY)
with tarfile.open(tarball, "w:gz") as f:
f.add(DOWNLOAD_DIRECTORY)
client = boto3.client("s3")
try:
client.upload_file(tarball, S3_BUCKET, s3_key_name)
except ClientError as e:
log.error(e)

Related

Update file in folder inside the s3bucket python

I have a folder inside the s3 bucket i need to update the file inside the existing folder
using python can any one assist
I found an answer to my own question. The code below works perfectly
from my local folder. I opened the folder and and use 'rb' (read binary) to store file.
s3 = boto3.client(
's3',
region_name='ap-south-1',
aws_access_key_id = S3_access_key,
aws_secret_access_key=S3_secret_key
)
path_data=os.path.join(MEDIA_ROOT, "screenShots",folder_name)
dir=os.listdir(path_data)
for file in dir:
with open(path_data+"/"+file, 'rb') as data:
s3.upload_fileobj(data, s3_Bucket_name, folder_name+"/"+file)
shutil.rmtree(path_data)

Boto3 - Multi-file upload to specific S3 bucket "path" using CLI arguments

new coder here. For work, I receive a request to put certain files in an already established s3 bucket with a requested "path."
For example: "Create a path of (bucket name)/1/2/3/ with folder3 containing (requested files)"
I'm looking to create a Python3 script to upload multiple files from my local machine to a specified bucket and "path" using CLI arguments specifying the file(s), bucket name, and "path"/key - I understand s3 doesn't technically have a folder structure, and that you have to put your "folders" in as part of the key, which is why I put "path" in quotes.
I have a working script doing what I want it to do, but the bucket/key is hard coded at the moment and I'm looking to get away from that with the use and understanding of CLI arguments. This is what I have so far -- it just doesn't upload the file, though it builds the path in s3 successfully :/
EDIT: Below is the working version of what I was looking for!
import argparse
#import os
import boto3
def upload_to_s3(file_name, bucket, path):
s3 = boto3.client('s3')
s3.upload_file(file_name, bucket, path)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--file_name')
parser.add_argument('--bucket')
parser.add_argument('--path')
args = parser.parse_args()
upload_to_s3(args.file_name, args.bucket, args.path)
my input is:
>>> python3 s3_upload_args_experiment.py --file_name test.txt --bucket mybucket2112 --path 1/2/3/test.txt
Everything executes properly!
Thank you much!

How can I generate a PDF with custom fonts using AWS Lambda?

I have an AWS Lambda function that generates PDFs using the html-pdf library with custom fonts.
At first, I imported my fonts externally from Google Fonts, but then the PDF's size has enlarged by ten times.
So I tried to import my fonts locally src('file:///var/task/fonts/...ttf/woff2') but still no luck.
Lastly, I trie to create fonts folder in the main project and then I added all of my fonts, plus the file fonts.config:
<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
<dir>/var/task/fonts/</dir>
<cachedir>/tmp/fonts-cache/</cachedir>
<config></config>
</fontconfig>
and set the following env:
FONTCONFIG_PATH = /var/task/fonts
but still no luck (I haven't installed fontconfig since I'm not sure how and if I need to).
My Runtime env is Node.js 8.1.0.
You can upload your fonts into an S3 bucket and then download them to the lambda's /tmp directory, during its execution. In case your lib creates .pkl files, you should first change your root directory to /tmp (lambda is not allowed to write in the default root directory).
The following Python code downloads your files from a /fonts directory in an S3 bucket to /tmp/fonts "local" directory.
import os
import boto3
os.chdir('/tmp')
os.mkdir(os.path.join('/tmp/', 'fonts'))
s3 = boto3.resource('s3')
s3_client = boto3.client('s3')
my_bucket = s3.Bucket("bucket_name")
for file in my_bucket.objects.filter(Prefix="fonts/"):
filename = file.key
short_filename = filename.replace('fonts/','')
if(len(short_filename) > 0):
s3_client.download_file(
bucket,
filename,
"/tmp/fonts/" + short_filename,
)

How to open a file in Google App Engine using python 3.5?

I am able to load my txt file using the line below on my local machine.
lines=open(args['train_file1'],mode='r').read().split('\n')
args is dict which has the dir of training file.
Now i changed the working python version to 3.5 and now i am getting this error. I am clueless why this error is coming, the file is present in that directory.
FileNotFoundError: [Errno 2] No such file or directory: 'gs://bot_chat-227711/data/movie_lines.txt'
If I understood your question correctly, you are trying to read a file from Cloud Storage in App Engine.
You cannot do so directly by using the open function, as files in Cloud Storage are located in Buckets in the Cloud. Since you are using Python 3.5, you can use the Python Client library for GCS in order to work with files located in GCS .
This is a small example, that reads your file located in your Bucket, in a handler on an App Engine application:
from flask import Flask
from google.cloud import storage
app = Flask(__name__)
#app.route('/openFile')
def openFile():
client = storage.Client()
bucket = client.get_bucket('bot_chat-227711')
blob = bucket.get_blob('data/movie_lines.txt')
your_file_contents = blob.download_as_string()
return your_file_contents
if __name__ == '__main__':
app.run(host='127.0.0.1', port=8080, debug=True)
Note that you will need to add the line google-cloud-storage to your requirements.txt file in order to import and use this library.

Running Headless Chrome using Python 3.6 on AWS Lambda - Permissions Error

I have struggled to get Headless Chrome running on AWS Lambda for days. It works fine on EC2 but when I try it on Lambda, I just get "Message: 'chromedriver' executable may have wrong permissions.
The modules are zipped with the chromedriver and headless-chromium executables in the root directory of the zip file. The total zipped file I upload to S3 is 52mb but extracted it is below the 250mb limit so I don't think that is the issue.
Python Zip Folder Structure Image
from selenium import webdriver
def lambda_handler(event, context):
options = webdriver.ChromeOptions()
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1280x1696")
options.add_argument("--disable-application-cache")
options.add_argument("--disable-infobars")
options.add_argument("--no-sandbox")
options.add_argument("--hide-scrollbars")
options.add_argument("--enable-logging")
options.add_argument("--log-level=0")
options.add_argument("--v=99")
options.add_argument("--single-process")
options.add_argument("--ignore-certificate-errors")
options.add_argument("--homedir=/tmp")
options.binary_location = "/var/task/headless-chromium"
driver = webdriver.Chrome("/var/task/chromedriver", chrome_options=options)
driver.get("https://www.google.co.uk")
title = driver.title
driver.close()
return title
if __name__ == "__main__":
title = lambda_handler(None, None)
print("title:", title)
A few posts on the web have reported compatibility issues that may have caused problems so I have the specific executable versions for Chrome and ChromeDriver from the web, where others seem to on previous success EC2 and other means.
DOWNLOAD SOURCES FOR HEADLESS CHROME AND CHROMEDRIVER
(stable) https://github.com/adieuadieu/serverless-chrome/releases/tag/v1.0.0-37
(https://sites.google.com/a/chromium.org/chromedriver/downloads) Download unavailable so retrieved from the source below
https://chromedriver.storage.googleapis.com/index.html?path=2.37/
Can anyone help me crack this?
I found a solution for this problem few minutes ago.
When use chromedriver in Lambda Function (I think) it need permission can write. but when chrome driver file is in 'task' folder or 'opt' folder, user can only have read permission.
Only folder can change permission in Lambda Function is 'tmp' folder.
So I move the chrome driver file to 'tmp' folder. and it works.
like this.
os.system("cp ./chromedriver /tmp/chromedriver")
os.system("cp ./headless-chromium /tmp/headless-chromium")
os.chmod("/tmp/chromedriver", 0o777)
os.chmod("/tmp/headless-chromium", 0o777)
chrome_options.binary_location = "/tmp/headless-chromium"
driver = webdriver.Chrome(executable_path=r"/tmp/chromedriver",chrome_options=chrome_options)

Resources