How to download all the files from S3 bucket irrespective of file key using python - python-3.x

I am working on an automation piece where I need to download all files from a folder inside a S3 bucket irrespective of the file name. I understand that the using boto3 in python I can download a file like:
s3BucketObj = boto3.client('s3', region_name=awsRegion, aws_access_key_id=s3AccessKey, aws_secret_access_key=s3SecretKey)
s3BucketObj.download_file(bucketName, "abc.json", "/tmp/abc.json")
but I was then trying to download all files irrespective of what filename to be specified in this way:
s3BucketObj.download_file(bucketName, "test/*.json", "/test/")
I know the syntax above could be totally wrong but is there a simple way to do that?
I did find a thread which helps here but seems a bit complex: Boto3 to download all files from a S3 Bucket

There is no API call to Amazon S3 that can download multiple files.
The easiest way is to use the AWS Command-Line Interface (CLI), which has aws s3 cp --recursive and aws s3 sync commands. It will do everything for you.
If you choose to program it yourself, then Boto3 to download all files from a S3 Bucket is a good way to do it. This is because you need to do several things:
Loop through every object (there is no S3 API to copy multiple files)
Create a local directory if it doesn't exist
Download the object to the appropriate local directory
The task can be made simpler if you do not wish to reproduce the directory structure (eg if all objects are in the same path). In that case, you can simply loop through the objects and download each of them to the same directory.

Related

How to copy folder having files, also having sub-folders with different files to aws S3 bukcet with same folder level structure using python language

Hi all can you please help to figure out this issue
How to copy a folder having py files, and aslo having sub folder level files which are present in a particular path with the same folder structure to asw s3 bucket path. The files should be reflected as same in the way how they are look like folder level that should be same in s3 bucket path as well
There is a fully fledged Python library for this. Install it via
pip install s3
The documentation is well written and you should have no trouble following it. The examples section shows how to upload file to S3
storage.write("example", remote_name, headers=headers)
You should be able to instruct the package to upload the folders and subfolders while maintaining their folder structure. You can also use os.walk to walk through your files and directories if you need to individually pick which files and folders need to be uploaded.

Append files to existing S3 bucket folder via Spark

I am working in Spark where we need to write the data to S3 bucket after performing some tranformations. I know that while writing dtaa to HDFS/S3 via Spark throws an exception if the folder path already exists. So in our case if S3://bucket_name/folder already exists while writing the data to the same S3 bucket path, it will throw an exception.
Now the possible solution is to use mode as OVERWRITE while writing through Spark. But that would delete all the files already present in it. I want to have a kind of APPEND functionality with the same folder. So if folder already has some files, then it would just add more files to it.
I am not sure if API out of the box gives any such functionality. Of course there is an option where I can create a temporary folder inside a folder and save the file. After that I can move that file to its parent folder and delete the temporary folder. But this kind of approach is not best.
So please suggest how to proceed with this.

Get local Files from Python lambda Function

i want get a file (List of files) located in my Local machine using a python lambda Function.
Im using the So library and try run local and works, but when y try run in aws my code does not detect the file.
I need verify a folder and it if has a file, upload this file to s3. This process (Verification and upload) will run according to a schedule.
Batch file it´s not a option.
Thanks for the help
It appears you are referring to: Access data sources on premises - Azure Logic Apps | Microsoft Docs
No, there is no equivalent for AWS Lambda functions.
An AWS Lambda function can access services on the Internet (eg make API calls, access websites), but you would need to code that yourself.

How can I extract a tar.gz file in a Google Cloud Storage bucket from a Colab Notebook?

As the question states, I'm trying to figure out how I can extract a .tar.gz file that is stored in a GCS Bucket from a Google Colab notebook.
I am able to connect to my bucket via:
auth.authenticate_user()
project_id = 'my-project'
!gcloud config set project {project_id}
However, when I try running a command such as:
!gsutil tar xvzf my-bucket/compressed-files.tar.gz
I get an error. I know that gsutil probably has limited functionality and maybe isn't meant to do what I'm trying to do, so is there a different way to do it?
Thanks!
Google Cloud Storage - GCS does not natively support unpacking a tar archive. You will have to do this yourself either on your local machine or from a Compute Engine VM, for instance
You can create a Dataflow process from a template to decompress a file in your Bucket
The template is called Bulk decompress Cloud Storage files
You have to specify file location, output location, failure log, and tmp location
This worked for me. I'm new to colab and python itself so I'm not certain this is the solution.
!sudo tar -xvf my-bucket/compressed-files.tar.gz

How can I have Azure File Share automatically generate non-existing directories?

With AWS S3, I can upload a file test.png to any directory I like, regardless of whether or not it exists... because S3 will automatically generate the full path & directories.
For example, if I when I upload to S3, I use the path this/is/a/new/home/for/test.png, S3 will create directories this, is, a, ... and upload test.png to the correct folder.
I am migrating over to Azure, and I am looking to use their file storage. However, it seems that I must manually create EVERY directory... I could obviously do it programmatically by checking to see if the folder exists and if not, create it... but wow...why should I work so hard?
I did try:
file_service.create_file_from_path('testshare', 'some/long/path', 'test.png', 'path/to/local/location/of/test.png')
However, that complains that the directory does not exist... and will only work if I either manually create the directories or replace some/long/path with None.
Is it possible to just hand Azure a path and have it create the directories?
Azure Files closely mimics OS File System and thus in order to push a file in a directory, that directory must exist. What that means is if you need to create a file in a nested directory structure, that directory structure must exist. Azure File service will not create that structure for you.
A better option in your scenario would be to use Azure Blob Storage. It closely mimics Amazon S3 behavior that you mentioned above. You can create a Container (similar to Bucket in S3) and then upload a file with a name like this/is/a/new/home/for/test.png.
However please note that the folders are virtual in Blob Storage (same as S3) and not the real one. Essentially the name with which the blob (similar to Object in S3) will be saved is this/is/a/new/home/for/test.png.

Resources