Here is the scenario. I have an S3 bucket (e.g. daily-data-input) where daily files will be written to a specific folder (e.g. S3://daily-data-input/data/test/). Whenever a file is written under the "test" folder a copy should also be written to the "test_copy" folder in the same bucket. If "test_copy" is not existing, it should be created.
I have used S3 event notification and attached it to a lambda function(with python 3.7) which will check if the "test_copy" key is existing if not will be created. I am able to create the "test_copy" folder successfully and couldn't make the S3 copy via boto3 to be working.
Here is the code for your reference:
import boto3
import os
import botocore
s3 = boto3.resource('s3')
s3_cli=boto3.client('s3')
def lambda_handler(event, context):
bucket_name = event ['Records'][0]['s3']['bucket']['name']
bucket_key = event['Records'][0]['s3']['object']['key']
file = (os.path.basename(bucket_key))
source_key_path = (os.path.dirname(bucket_key))
target_keypath = source_key_path+'_'+'copy'+'/'
target_bucket_key = target_keypath+file
copy_source = {'Bucket': bucket_name, 'Key': bucket_key}
try:
s3.Object(bucket_name, target_keypath).load()
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
# Create the key
print ("Creating target _copy folder")
s3_cli.put_object(Bucket=bucket_name,Key=target_keypath)
#copy the file
#s3.copy_object(Bucket=bucket_name, Key=target_bucket_key, CopySource=copy_source)
else:
print ("Something went wrong!!")
else:
print ("Key exists!!")
# s3.copy_object(Bucket=bucket_name, Key=target_bucket_key, CopySource=copy_source)
I tried s3.copy_object, s3_cli.meta.client.copy, bucket.copy() and none of them are working. Please let me know if i am doing something wrong.
Here is one simple way to copy an object in S3 within a bucket:
import boto3
s3 = boto3.resource('s3')
bucket = 'mybucket'
src_key = 'data/test/cat.png'
dest_key = 'data/test_copy/cat.png'
s3.Object(bucket, dest_key).copy_from(CopySource=f'{bucket}/{src_key}')
Here is another, lower-level way to do the same thing:
import boto3
s3 = boto3.client('s3')
bucket = 'mybucket'
src_key = 'data/test/cat.png'
dest_key = 'data/test_copy/cat.png'
s3client.copy_object(Bucket=bucket, CopySource={'Bucket':bucket,'Key':src_key}, Key=dest_key)
Related
I have hereby attached my hardcoded python program which appends two JSON files in the S3 storage to be appended manually. Can someone please tell me how to get multiple input files (JSON files) from the S3 bucket automatically. I know we can do the same in python using *json in the directory of the program but I don't understand how to do the same in AWS Lambda.
Python Code:
import glob
result = []
for f in glob.glob("*.json"):
with open(f, "r") as infile:
result += json.load(infile)
with open("merge.json", "w") as outfile:
json.dump(result, outfile)
For doing the same in lambda I am able to do it for like 2 files, can someone please suggest how to do the same (like taking all JSON files from S3 automatically) in lambda. Thanks in advance.
import boto3
import json
s3_client = boto3.client("s3")
S3_BUCKET = 'bucket-for-json-files'
def lambda_handler(event, context):
object_key = "sample1.json" # replace object key
file_content = s3_client.get_object(Bucket=S3_BUCKET, Key=object_key)["Body"].read()
print(file_content)
object_key2 = "sample2.json" # replace object key
file_content2 = s3_client.get_object(Bucket=S3_BUCKET, Key=object_key2)["Body"].read()
print(file_content2)
result = []
result += json.loads(file_content)
result += json.loads(file_content2)
print(result)
Have followed the syntax from the documentation but I still get the timeout error.
import boto3
# Create a client
client = boto3.client('s3', region_name='us-east-1')
# Create a reusable Paginator
paginator = client.get_paginator('list_objects')
# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket='bucket-for-json-files')
for page in page_iterator:
print(page['Contents'])
Getting a timeout error:
import boto3
s3_client = boto3.client("s3")
S3_BUCKET = 'bucket-for-json-files'
def iterate_bucket_items(S3_BUCKET):
client = boto3.client('s3')
paginator = client.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket=S3_BUCKET)
for page in page_iterator:
if page['KeyCount'] > 0:
for item in page['Contents']:
yield item
for i in iterate_bucket_items(bucket='S3_BUCKET'):
print (i)
Have solved the issue with the help of #JeremyThompson, will attach my final code here:
import json
import boto3
import glob
def lambda_handler(event, context):
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket-for-json-files')
# Create a client
client = boto3.client('s3', region_name='us-east-1')
# Create a reusable Paginator
paginator = client.get_paginator('list_objects')
# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket='bucket-for-json-files')
result = []
for page in page_iterator:
result += page['Contents']
s3 = boto3.client('s3')
bucket = 'bucket-for-json-files'
merge = []
lst = []
for i in result:
cmd = i['Key']
print(cmd)
The above code prints the key from each json file available in the user's bucket.
I want to download files from a particular s3 bucket based on files Last modified date.
I have researched on how to connect boto3 and there is plenty of code and documentation available for downloading the file without any conditions. I made a pseudo code
def download_file_s3(bucket_name,modified_date)
# connect to reseource s3
s3 = boto3.resource('s3',aws_access_key_id='demo', aws_secret_access_key='demo')
# connect to the desired bucket
my_bucket = s3.Bucket(bucket_name)
# Get files
for file in my_bucket.objects.all():
I want to complete this function, basically, passing a modified date the function returns the files in the s3 bucket for that particular modified date.
I have a Better solution or a function which could do this automatically. Just pass In the Bucket name and Download path name.
from boto3.session import Session
from datetime import date, timedelta
import boto3
import re
def Download_pdf_specifc_date_subfolder(bucket_name,download_path)
ACCESS_KEY = 'XYZ'
SECRET_KEY = 'ABC'
Bucket_name=bucket_name
# code to create a session
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
bucket = s3.Bucket(Bucket_name)
# code to get the yesterdays date
yesterday = date.today() - timedelta(days=1)
x=yesterday.strftime('20%y-%m-%d')
print(x)
#code to add the files to a list which needs to be downloaded
files_to_downloaded = []
#code to take all the files from s3 under a specific bucket
for fileObject in bucket.objects.all():
file_name = str(fileObject.key)
last_modified=str(fileObject.last_modified)
last_modified=last_modified.split()
if last_modified[0]==x:
# Enter the specific bucketname in the regex in place of Airports to filter only the particluar subfolder
if re.findall(r"Airports/[a-zA-Z]+", file_name):
files_to_downloaded.append(file_name)
# code to Download into a specific Folder
for fileObject in bucket.objects.all():
file_name = str(fileObject.key)
if file_name in files_to_downloaded:
print(file_name)
d_path=download_path + file_name
print(d_path)
bucket.download_file(file_name,d_path)
Download_pdf_specifc_date_subfolder(bucket_name,download_path)
Ultimately the function will give the results in the specific Folder with the files to be downloaded.
Here is my test code and it will print the last_modified datetime of objects which have the datetime after what I set.
import boto3
from datetime import datetime
from datetime import timezone
s3 = boto3.resource('s3')
response = s3.Bucket('<bucket name>').objects.all()
for item in response:
obj = s3.Object(item.bucket_name, item.key)
if obj.last_modified > datetime(2019, 8, 1, 0, 0, 0, tzinfo=timezone.utc):
print(obj.last_modified)
If you have a specific date, then
import boto3
from datetime import datetime, timezone
s3 = boto3.resource('s3')
response = s3.Bucket('<bucket name>').objects.all()
date = '20190827' # input('Insert Date as a form YYYYmmdd')
for item in response:
obj = s3.Object(item.bucket_name, item.key)
if obj.last_modified.strftime('%Y%m%d') == date:
print(obj.last_modified)
will give the results as follows.
2019-08-27 07:13:04+00:00
2019-08-27 07:13:36+00:00
2019-08-27 07:13:39+00:00
If edited this answer to download all files after a certain timestamp and then write the current time to a file for use in the next iteration. You can easily adapt this to only download files of a specific date, month, year, yesterday, etc.
import os
import boto3
import datetime
import pandas as pd
### Load AWS Key, Secret and Region
# ....
###
# Open file to read last download time and update file with current time
latesttime_file = "latest request.txt"
with open(latesttime_file, 'r') as f:
latest_download = pd.to_datetime(f.read(), utc=True)
with open(latesttime_file, 'w') as f:
f.write(str(datetime.datetime.utcnow()))
# Initialize S3-client
s3_client = boto3.client('s3',
region_name=AWS_REGION,
aws_access_key_id=AWS_KEY_ID,
aws_secret_access_key=AWS_SECRET)
def download_dir(prefix, local, bucket, timestamp, client=s3_client):
"""
params:
- prefix: pattern to match in s3
- local: local path to folder in which to place files
- bucket: s3 bucket with target contents
- client: initialized s3 client object
"""
keys = []
dirs = []
next_token = ''
base_kwargs = {
'Bucket':bucket,
'Prefix':prefix,
}
while next_token is not None:
kwargs = base_kwargs.copy()
if next_token != '':
kwargs.update({'ContinuationToken': next_token})
results = client.list_objects_v2(**kwargs)
contents = results.get('Contents')
for i in contents:
k = i.get('Key')
t = i.get('LastModified')
if k[-1] != '/':
if t > timestamp:
keys.append(k)
else:
dirs.append(k)
next_token = results.get('NextContinuationToken')
for d in dirs:
dest_pathname = os.path.join(local, d)
if not os.path.exists(os.path.dirname(dest_pathname)):
os.makedirs(os.path.dirname(dest_pathname))
for k in keys:
dest_pathname = os.path.join(local, k)
if not os.path.exists(os.path.dirname(dest_pathname)):
os.makedirs(os.path.dirname(dest_pathname))
client.download_file(bucket, k, dest_pathname)
download_dir(<prefix or ''>, <local folder to download to>, <bucketname>, latest_download)
I'm writing a script to parse S3 buckets files, without needing to download them locally. It seems the code works as far as it doesn't find glacier files. I'm adding an exception for now (error handling looks better in actual code, I promise), but ideally I'd like to see if it's possible to filter glacier files out.
Here is my code:
import boto3
import gzip
import os
try:
s3_client = boto3.client('s3')
bucket = 'my_bucket'
prefix = 'path_to_file/file_name.csv.gz'
obj = s3_client.get_object(Bucket=bucket, Key=prefix)
body = obj['Body']
with gzip.open(body, 'rt') as gf:
for ln in gf:
print(ln)
except Exception as e:
print(e)
I see that using AWS CLI, I can at lest sort files in the way glacier files are at the bottom, so there must be a way to either way sort or filter them out in boto3:
aws s3api list-objects --bucket my-bucket --query "reverse(sort_by(Contents,&LastModified))"
Solved using StorageClass == 'STANDARD' (vs == 'GLACIER'):
bucket = 'my_bucket'
prefix = 'path/to/files/'
s3_client = boto3.client('s3')
response = s3_client.list_objects(Bucket=bucket, Prefix=prefix)
for file in response['Contents']:
if file['StorageClass'] == 'STANDARD':
name = file['Key'].rsplit('/', 1)
if name[1] != '':
file_name = name[1]
obj = s3_client.get_object(Bucket=bucket, Key=prefix + file_name)
body = obj['Body']
lns = []
i = 0
with gzip.open(body, 'rt') as gf:
for ln in gf:
i += 1
lns.append(ln.rstrip())
if i == 10:
break
import boto3
import cv2
import numpy as np
s3 = boto3.resource('s3')
vid = (s3.Object('bucketname', 'video.blob').get()['Body'].read())
cap = cv2.VideoCapture(vid)
This is my code. I have a video file in an s3 bucket. I want to do some processing on it with OpenCV and I don't want to download it. So I'm trying to store that video file into vid. Now the problem is that type(vid) is byte which is the reason to result in this error TypeError: an integer is required (got type bytes)
on line 6. I tried converting it into an integer or a string but was unable to.
On an attempt to convert byte to an integer: I referred to this and was getting length issues. This is just a sample video file. The actual file I want to do processing on will be huge when converted to byte object.
On an attempt to get the object as a string and then convert it to an integer: I referred to this. Even this doesn't seem to work for me.
If anyone can help me solve this issue, I will be grateful. Please comment if anything is uncertain to you regarding my issue and I'll try to provide more details.
If streaming the video from a url is an acceptable solution, I think that is the easiest solution. You just need to generate a url to read the video from.
import boto3
import cv2
s3_client = boto3.client('s3')
bucket = 'bucketname'
key = 'video.blob'
url = s3_client.generate_presigned_url('get_object',
Params = {'Bucket': bucket, 'Key': key},
ExpiresIn = 600) #this url will be available for 600 seconds
cap = cv2.VideoCapture(url)
ret, frame = cap.read()
You should see that you are able to read and process frames from that url.
Refer below the useful code snippet to perform various operations on S3 bucket.
import boto3
s3 = boto3.resource('s3', region_name='us-east-2')
for listing buckets in s3
for bucket in s3.buckets.all():
print(bucket.name)
bucket creation in s3
my_bucket=s3.create_bucket(Bucket='Bucket Name', CreateBucketConfiguration={
'LocationConstraint': 'us-east-2'
})
listing down objects inside bucket
my_bucket = s3.Bucket('Bucket Name')
for file in my_bucket.objects.all():
print (file.key)
Uploading a file from current directory
import os
print(os.getcwd())
fileName="B01.jpg"
bucketName="Bucket Name"
file = open(fileName)
s3.meta.client.upload_file(fileName, bucketName, 'test2.txt')
reading image/video from bucket
import matplotlib.pyplot as plt
s3 = boto3.resource('s3', region_name='us-east-2')
bucket = s3.Bucket('Bucket Name') # bucket name
object = bucket.Object('maisie_williams.jpg') # image name
object.download_file('B01.jpg') #donwload image with this name
img=plt.imread('B01.jpg') #read the downloaded image
imgplot = plt.imshow(img) #plot the image
plt.show(imgplot)
Reading from one bucket and then dumping it to another
import boto3
s3 = boto3.resource('s3', region_name='us-east-2')
bucket = s3.Bucket('Bucket Name') # bucket name
object = bucket.Object('maisie_williams.jpg') # image name
object.download_file('B01.jpg')
fileName="B01.jpg"
bucketName="Bucket Name"
file = open(fileName)
s3.meta.client.upload_file(fileName, bucketName, 'testz.jpg')
If you have access keys then you can probably do the folowing
keys = pd.read_csv('accessKeys.csv')
#creating Session for S3 buckets
session = boto3.session.Session(aws_access_key_id=keys['Access key ID'][0],
aws_secret_access_key=keys['Secret access key'][0])
s3 = session.resource('s3')
buck = s3.Bucket('Bucket Name')
I am trying to download a file from AWS S3 server to my local. However, when I tried to run the below query I get an error saying "IOError: [Errno 2] No such file or directory:"
import boto3
from botocore.client import Config
ACCESS_KEY_ID = '###'
ACCESS_SECRET_KEY = '###'
BUCKET_NAME = 'abc.helper'
FILE_NAME = 'k.png'
data = open(FILE_NAME, 'rb')
#### S3 Connect:
s3 = boto3.resource(
's3',
aws_access_key_id=ACCESS_KEY_ID,
aws_secret_access_key=ACCESS_SECRET_KEY,
config=Config(signature_version='s3v4')
)
#### Image download:
s3.Bucket(BUCKET_NAME).download_file(FILE_NAME, '/Users/kevin/desktop');
print ("Done")
I have hidden the Access Key and Secret ID fr obvious reasons. Could anyone help me find where the error is. Thanks.
May be this will help you
import boto3
import botocore
BUCKET_NAME = 'my-bucket' # replace with your bucket name
KEY = 'my_image_in_s3.jpg' # replace with your object key
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
This link will help you more s3-example-download-file