I am currently using the below python script to download data from AWS S3 to my local. Only problem I have is when I run this I have to manually enter the exact folder from where the files need to be downloaded. The S3 bucket I use creates a new folder for each day and I would like to download files from only the current day's folder. I tried creating a variable using the system date and tried to pass that in the bucket list variable but the script did nothing neither did it throw an error. Could anyone help me with this.
import boto, os
import datetime
from os import path
current_date = datetime.datetime.now().strftime("%Y-%m-%d")
LOCAL_PATH = '/Users/user/Desktop/rep'
AWS_ACCESS_KEY_ID = 'ACCESS'
AWS_SECRET_ACCESS_KEY = 'SECRET'
bucket_name = 'bucket'
# connect to the bucket
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(bucket_name)
# go through the list of files
bucket_list = bucket.list(prefix='Nation/State/City/2018-05-01')
#bucket_list = bucket.list(prefix='Nation/State/City/current_date')
#bucket_list = bucket.list()
for l in bucket_list:
keyString = str(l.key)
d = LOCAL_PATH + keyString
try:
l.get_contents_to_filename(d)
except OSError:
# check if dir exists
if not os.path.exists(d):
os.makedirs(d)
Thanks..
Your Python code is wrong for what you want.
The error is here:
bucket_list = bucket.list(prefix='Nation/State/City/current_date')
In this context, current_data is just a string containing the words current_data. To fix it you should change the line above to:
bucket_list = bucket.list(prefix='Nation/State/City/{}'.format(current_date))
This line will pick the value of current_date variable and set it in your prefix string, replacing the {}.
I would also recommend you to check this link:
https://www.digitalocean.com/community/tutorials/how-to-use-string-formatters-in-python-3.
Related
I am running into a problem trying to update an AWS Gamelift script with a python command that zips a directory and uploads it with all its contents as a newer version to AWS Gamelift.
from zipfile import ZipFile
import os
from os.path import basename
import boto3
import sys, getopt
def main(argv):
versInput = sys.argv[1]
#initializes client for updating script in aws gamelift
client = boto3.client('gamelift')
#Where is the directory relative to the script directory. In this case, one folder dir lower and the contents of the RealtimeServer dir
dirName = '../RealtimeServer'
# create a ZipFile object
with ZipFile('RealtimeServer.zip', 'w') as zipObj:
# Iterate over all the files in directory
for folderName, subfolders, filenames in os.walk(dirName):
rootlen = len(dirName) + 1
for filename in filenames:
#create complete filepath of file in directory
filePath = os.path.join(folderName, filename)
# Add file to zip
zipObj.write(filePath, filePath[rootlen:])
response = client.update_script(
ScriptId=SCRIPT_ID_GOES_HERE,
Version=sys.argv[1],
ZipFile=b'--zip-file \"fileb://RealtimeServer.zip\"'
)
if __name__ == "__main__":
main(sys.argv[1])
I plan on using it by giving it a new version number everytime I make changes with:
python updateScript.py "0.1.1"
This is meant to help speed up development. However, I am doing something wrong with the ZipFile parameter of client.update_script()
For context, I can use the AWS CLI directly from the commandline and update a script without a problem by using:
aws gamelift update-script --script-id SCRIPT_STRING_ID_HERE --script-version "0.4.5" --zip-file fileb://RealtimeServer.zip
However, I am not sure what is going on because it fails to unzip the file when I try it:
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the UpdateScript operation: Failed to unzip the zipped file.
UPDATE:
After reading more documentation about the ZipFile parameter:
https://docs.aws.amazon.com/gamelift/latest/apireference/API_UpdateScript.html
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/gamelift.html#GameLift.Client.update_script
I tried sending a base64 encoded version of the zip file. However, that didn't work. I put the following code before the client_update part of the script and used b64EncodedZip as the ZipFile parameter.
with open("RealtimeServer.zip", "rb") as f:
bytes = f.read()
b64EncodedZip = base64.b64encode(bytes)
I was able to get it to work by having some help from a maintainer of boto3 over at https://github.com/boto/boto3/issues/2646
(Thanks #swetashre)
Here is the code and it will only work up to 5mb and requires use of an s3 bucket if you want to upload a zip file any larger than that.
from zipfile import ZipFile
import os
from os.path import basename
import boto3
import sys, getopt
def main(argv):
versInput = sys.argv[1]
#initializes client for updating script in aws gamelift
client = boto3.client('gamelift')
#Where is the directory relative to the script directory. In this case, one folder dir lower and the contents of the RealtimeServer dir
dirName = '../RealtimeServer'
# create a ZipFile object
with ZipFile('RealtimeServer.zip', 'w') as zipObj:
# Iterate over all the files in directory
for folderName, subfolders, filenames in os.walk(dirName):
rootlen = len(dirName) + 1
for filename in filenames:
#create complete filepath of file in directory
filePath = os.path.join(folderName, filename)
# Add file to zip
zipObj.write(filePath, filePath[rootlen:])
with open('RealtimeServer.zip','rb') as f:
contents = f.read()
response = client.update_script(
ScriptId="SCRIPT_ID_GOES_HERE",
Version=sys.argv[1],
ZipFile=contents
)
if __name__ == "__main__":
main(sys.argv[1])
I got the script working but I did it by avoiding the use of boto3. I don't like it but it works.
os.system("aws gamelift update-script --script-id \"SCRIPT_ID_GOES_HERE\" --script-version " + sys.argv[1] + " --zip-file fileb://RealtimeServer.zip")
If anyone knows how to get boto3 to work for updating an AWS Gamelift script then please let me know.
How to check if a particular file is present inside a particular directory in my S3? I use Boto3 and tried this code (which doesn't work):
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
key = 'dootdoot.jpg'
objs = list(bucket.objects.filter(Prefix=key))
if len(objs) > 0 and objs[0].key == key:
print("Exists!")
else:
print("Doesn't exist")
While checking for S3 folder, there are two scenarios:
Scenario 1
import boto3
def folder_exists_and_not_empty(bucket:str, path:str) -> bool:
'''
Folder should exists.
Folder should not be empty.
'''
s3 = boto3.client('s3')
if not path.endswith('/'):
path = path+'/'
resp = s3.list_objects(Bucket=bucket, Prefix=path, Delimiter='/',MaxKeys=1)
return 'Contents' in resp
The above code uses MaxKeys=1. This it more efficient. Even if the folder contains lot of files, it quickly responds back with just one of the contents.
Observe it checks Contents in response
Scenario 2
import boto3
def folder_exists(bucket:str, path:str) -> bool:
'''
Folder should exists.
Folder could be empty.
'''
s3 = boto3.client('s3')
path = path.rstrip('/')
resp = s3.list_objects(Bucket=bucket, Prefix=path, Delimiter='/',MaxKeys=1)
return 'CommonPrefixes' in resp
Observe it strips off the last / from path. This prefix will check just that folder and doesn't check within that folder.
Observe it checks CommonPrefixes in response and not Contents
import boto3
import botocore
client = boto3.client('s3')
def checkPath(file_path):
result = client.list_objects(Bucket="Bucket", Prefix=file_path )
exists=False
if 'Contents' in result:
exists=True
return exists
if the provided file_path will exist then it will return True.
example: 's3://bucket/dir1/dir2/dir3/file.txt'
file_path: 'dir1/dir2' or 'dir1/'
Note:- file path should start with the first directory just after the bucket name.
Basically a directory/file is S3 is an object. I have created a method for this (IsObjectExists) that returns True or False. If the directory/file doesn't exists, it won't go inside the loop and hence the method return False, else it will return True.
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('<givebucketnamehere>')
def IsObjectExists(path):
for object_summary in bucket.objects.filter(Prefix=path):
return True
return False
if(IsObjectExists("<giveobjectnamehere>")):
print("Directory/File exists")
else:
print("Directory/File doesn't exists")
Note that if you are checking a folder, make sure that you end the string with / . One use case is that when you try to check for a folder called Hello and if the folder doesn't exist, rather there is a folder called Hello_World. In such case, method will return True. In this case you have to add / character to the end of folder name while coding. You can see how this is handled in the below example
foldername = "Hello/"
if(IsObjectExists(foldername))
print("Directory/File exists")
import boto3
import botocore
client = boto3.client('s3')
result= client.list_objects_v2(Bucket='athenards', Prefix = 'cxdata')
for obj in result['Contents']:
if obj['Key'] == 'cxdata/':
print("true")
Please try this code as following
Get subdirectory info folderĀ¶
folders = bucket.list("","/")
for folder in folders:
print (folder.name)
PS reference URL(How to use python script to copy files from one bucket to another bucket at the Amazon S3 with boto)
The following code should work...
import boto3
import botocore
def does_exist(bucket_name, folder_name):
s3 = boto3.resource(
service_name='s3',
region_name='us-east-2',
aws_access_key_id='********************',
aws_secret_access_key='********************'
)
objects = s3.meta.client.list_objects_v2(Bucket=bucket_name, Delimiter='/', Prefix='')
# print(objects)
folders = objects['CommonPrefixes']
folders_in_bucket = []
for f in folders:
print(f['Prefix'])
folders_in_bucket.append(f['Prefix'])
return folder_name in folders_in_bucket
print("does it exist?", does_exist('images-bucket','ddd/'))
As #Vinayak mentioned in one of the answer's comment in march, 2020...
The way to get a 'folder' list in boto3 is objects = s3.list_objects_v2(Bucket=BUCKET_NAME, Delimiter='/', Prefix='')
While running this with the latest versions of boto3 and botocore in August 2021 - '1.18.27', '1.21.27' respectively, gives the following error:
AttributeError: 's3.ServiceResource' object has no attribute 'list_objects_v2'
This happens since you are using s3 = s3.resource("mybucketname", credential-params) and s3.ServiceResource will not have s3.list_objects_v2() method. Instead, ServiceResource is having a meta attribute that will further have client type object from where you can apply Client object's methods on ServiceResource Object. like this - s3.meta.client.list_objects_v2()
Hope that helps!
Check this for checking folder is existed and not empty:
def folder_exists_and_not_empty(bucket_name: str, object_key: str) -> bool:
'''
Folder should exists.
Folder should not be empty.
'''
if not object_key.endswith('/'):
object_key = object_key+'/'
s3 = boto3.resource("s3")
bucket = s3.Bucket(bucket_name)
current_object = [file.key for file in bucket.objects.filter(Prefix=object_key) if (file.key == object_key and (str(file.get()['ContentType']).startswith('application/x-directory')))]
list_files = [file.key for file in bucket.objects.filter(Prefix=object_key) if (file.key != object_key)]
return len(current_object) == 1 and len(list_files) > 0
I am trying to move the files from one folder to the other based on the time or date stamp. It's something like I want to keep today file in the same folder and move yesterday file into a different folder.
Currently, I am able to move the files from one folder to other but it's not on date or time-based.
The file name will look something like this.
"output-android_login_scenarios-android-1.43-9859-2019-04-30 11:29:31.542548.html"
-------python
def move(self, srcdir,dstdir):
currentDirectory = os.path.dirname(__file__)
sourceFile = os.path.join(currentDirectory, srcdir)
destFile = os.path.join(currentDirectory, dstdir)
if not os.path.exists(destFile):
os.makedirs(destFile)
source = os.listdir(sourceFile)
try:
for files in source:
shutil.move(sourceFile+'/'+files, destFile)
except:
print("No file are present")
I think I have something that might work for you. I have made some minor tweaks to your "move" function, so I hope you don't mind. This method will also work if you have more than one 'old' file that needs moving.
Let me know if this helps :)
import os
import shutil
import re
from datetime import datetime
sourceDir = 'C:\\{folders in your directory}\\{folder containing the files}'
destDir = 'C:\\{folders in your directory}\\{folder containing the old files}'
files = os.listdir(sourceDir)
list_of_DFs = []
for file in files:
if file.endswith('.html'):
name = file
dateRegex = re.compile(r'\d{4}-\d{2}-\d{2}')
date = dateRegex.findall(file)
df = pd.DataFrame({'Name': name, 'Date': date})
list_of_DFs.append(df)
filesDF = pd.concat(list_of_DFs,ignore_index=True)
today = datetime.today().strftime('%Y-%m-%d')
filesToMove = filesDF[filesDF['Date'] != today]
def move(file, sourceDir, destDir):
sourceFile = os.path.join(sourceDir, file)
if not os.path.exists(destDir):
os.makedirs(destDir)
try:
shutil.move(sourceFile, destDir)
except:
print("No files are present")
for i in range(len(filesToMove)):
file = filesToMove['Name'][i]
move(file,sourceDir,destDir)
I want to upload files from disk to aws s3 bucket maintaining different folder structure. I am able to make the same structure as on disk but need a little change in the structure.
The folder structure on disk is: /home/userdata/uploaded_folder/uploaded_file
The folder(key) structure I want to maintain on aws bucket is:
/userdata/uploaded_folder/uploaded_file/
My current code is like this:
from boto.s3.connection import S3Connection
from boto.s3.key import Key
import os
conn = S3Connection()
path = '/home/userdata/'
bucket = conn.get_bucket('myBuck')
for root, dirs, files in os.walk(path):
for name in files:
#print(root)
path = root.split(os.path.sep)[1:]
path.append(name)
#print(path)
key_id = os.path.join(*path)
k = Key(bucket)
k.key = key_id
#print(key_id)
#k.set_contents_from_filename(os.path.join(root, name))
The above code makes the exact structure in the bucket. How to make the change in the path?
correct line
path = root.split(os.path.sep)[1:]
to
path = root.split(os.path.sep)[2:]
first element in root.split(os.path.sep) is '' not 'home'
I want to transfer files from one s3 bucket path (say B1/x/*) to another S3 bucket (say B2/y/*), where B1 and B2 are two s3 buckets and x and y are folders in them which contain csv files respectively.
I have written below script to do this. But I am getting error `object_list' is not defined. Moreover, I am not sure whether it will perform the job of transferring files or not.
Refer the script below:
import boto3
s3 = boto3.client("s3")
# list_objects_v2() give more info
more_objects=True
found_token = True
while more_objects :
if found_token :
response= s3.list_objects_v2(
Bucket="B1",
Prefix="x/",
Delimiter="/")
else:
response= s3.list_objects_v2(
Bucket="B1",
ContinuationToken=found_token,
Prefix="x/",
Delimiter="/")
# use copy_object or copy_from
for source in object_list["Contents"]:
raw_name = source["Key"].split("/")[-1]
new_name = "new_structure/{}".format(raw_name)
s3.copy_from(CopySource='B1/x')
# Now check there is more objects to list
if "NextContinuationToken" in response:
found_token = response["NextContinuationToken"]
more_objects = True
else:
more_objects = False
It would be really helpful if anyone could help me in making changes in the above script.
Thanks
You can use below code to transfer files from one bucket to another in a layered folder structure like yours. Here you won't have to define any specific key or folder structure, the code takes care of that:
import boto3
s3 = boto3.resource('s3')
src_bucket = s3.Bucket('bucket_name')
dest_bucket = s3.Bucket('bucket_name')
dest_bucket.objects.all().delete() #this is optional clean bucket
for obj in src_bucket.objects.all():
s3.Object('dest_bucket', obj.key).put(Body=obj.get()["Body"].read())
If you want to clear your source bucket once the files are moved, you can
use src_bucket.objects.all().delete() at the end of your code to clean the
source bucket.
If your script running in local server and want to access two buckets for transferring files from one s3 bucket to another, you can follow below code .This create a copy of files in "bucket1" to "sample" folder in "bucket2".
import boto3
s3 = boto3.resource('s3')
src_bucket = s3.Bucket('bucket1')
dest_bucket = s3.Bucket('bucket2')
for obj in src_bucket.objects.all():
filename= obj.key.split('/')[-1]
dest_bucket.put_object(Key='sample/' + filename, Body=obj.get()["Body"].read())
I you want to remove files after copying from source bucket,below code can use within the loop after copying.
s3.Object(src_bucket, obj.key).delete()